GPascal Source organization
---------------------------

Author: Nick Gammon
Date:   21 June 2011

I'll just explain the source organisation a bit. 

This was written before I had even heard of "linkers". My fundamental problem was that the Merlin assembler had to hold, in the memory of an Apple 2, both the source and the object, at one time, as this was an in-memory compiler.

However the source was much too large to all fit in memory.

So I broke the source down into logical, and fairly self-contained parts:

Part 1 - tokenization and general utilities

Part 2 - compiler (produces P-codes)

Part 3 - menu, file handling

Part 4 - interpreter (interprets P-codes)

Part 5 - text editor

Part 6 - more of the compiler (processing a 'block')


Now the issue was, how to "link" the parts together. So first I allocated "global" variables by simply assigning them addresses in memory, and having an EQU instruction at the start of each file, giving each variable the same address.

Then to make it easy for something in part 3 to call something in part 2 (say), I made a "jump table" at the start of each part (which is probably exactly what compilers/linkers to these days). For example, in part 1 on page 4 there is a list of important subroutines I might want to use from other files:

                              289   * VECTORS
                                        ************************************************
8013: 4C 97 80   291                     JMP   INIT             
8016: 4C 07 81   292                     JMP   GETNEXT       
8019: 4C 5A 81   293                     JMP   COMSTL         
801C: 4C 71 81   294                     JMP   ISITHX         
801F: 4C 95 81   295                     JMP   ISITAL         
8022: 4C A9 81   296                     JMP   ISITNM         
8025: 4C 3A 81   297                     JMP   CHAR             
8028: 4C 78 89   298                     JMP   GEN2:B         
802B: 4C FE 86   299                     JMP   DISHX           


By trial-and-error I worked out how much memory each file used, and assigned them all starting addresses like this:

                              10     P1             EQU   $8013           
                              11     P2             EQU   $8DD4           
                              12     P3             EQU   $992E           
                              13     P4             EQU   $A380           
                              14     P5             EQU   $B384           
                              15     P6             EQU   $BCB8           

So now each file knows that part 1 starts at address $8013 (which you can see from the above would be the "JMP INIT" line. So now the other files can just reference the jump table without needing to know exactly where it jumps to. For example in part 2:

                              287   INIT         EQU   V1                 
                              288   GETNEXT   EQU   V1+3             
                              289   COMSTL     EQU   V1+6             
                              290   ISITHX     EQU   V1+9             
                              291   ISITAL     EQU   V1+12           
                              292   ISITNM     EQU   V1+15           
                              293   CHAR         EQU   V1+18           
                              294   GEN2:B     EQU   V1+21           
                              295   DISHX       EQU   V1+24   

So if part 2 needs to call ISITHX (is it hex) then it does a JSR to ISITHX (which will be $8013 + 9) which takes it to the "JMP ISITHX" in part 1, which then jumps to the actual subroutine, which then does a RTS in the normal way.

The jump tables are different sizes depending on what functions needed exporting (effectively, depending on whether they were internal to the same file, or needed to be exported to other files). 

The whole thing worked out fairly smoothly.

The other things about the source is that I found myself very tight for memory, so in quite a few places I replaced something like this:

... blah blah ...
    JSR GTOKEN       ; get token for one-token lookahead
    RTS                     ; done with this function

by this:

... blah blah ...
    JMP GTOKEN       ; get token for lookahead, and then done

This had the same effect (I believe it is called "tail recursion" these days). Rather than calling GTOKEN, returning and then returning again, by jumping to GTOKEN the return from GTOKEN actually returns from the caller. This saved one byte each place I did it (and a few machine cycles too).


The *.XREF.txt files are "cross reference" files produced by the assembler. Basically you can use those to quickly find where a particular symbol is.