#* Compile.pss This is a parse-script which compiles parse-scripts (!). What is more, it can compile itself... so we can do >> pp -f compile.pss compile.pss This is useful because the resulting 'assembler' program (in sav.pp) and printed to stdout, can be used as a replacement for 'asm.pp' which is the default parse-script language compiler. The advantage is that it is easier to maintain and add new sytax to compile.pss than it is to 'asm.pp'. This script uses the virtual machine and engine implemented at http://bumble.sf.net/books/pars/object/ It implements a script language with a syntax reminiscent of sed and awk (much simpler than awk, but more complex than sed). This code was created in a straightforward manner by adapting the "assembled" code in 'asm.pp'. Some extra error checks were added. Also, the EOF test was placed at the end of the script to remove the 'last character' bug. It was evident that using the script language is much more comfortable that hand-coding parse machine assembler programs. REPLACING ASMPP We can use this script as a replacement for "asm.pp" or "asm.handcode.pp" which is a script assembler written by hand in the parse machine assembly format (1 command per line, labels, jumps, tests, etc). * replace asm.pp with compile.pss ----- # generate the new script assembler cp asm.pp asm.old; pp -f compile.pss compile.pss > asm.new.pp cp asm.new.pp asm.pp # test the new assembler (the script "r;t;t;t;d;" will be compiled # by the new asm.pp which we have just created. pp -e "r;t;t;t;d;" -i "abcd" # output: aaabbbcccddd ,,, This appears to be working (23 august 2019). The advantage of all this, is that it is much easier to maintain and add new syntax to "compile.pss" than it is to asm.handcode.pp For example asm.handcode.pp still uses "rabbit hops" to compile "quoteset" tokens, which is very inefficient but compile.pss uses the new look-ahead technique. Also, we not have a negated == test implemented in compile.pss but not implemented in asm.handcode.pp I will no longer continue to maintain asm.handcode.pp because its real purpose was to bootstrap the current script. I will maintain working copies of asm.pp as generated by this script in case of future errors. NOTES The accumulator register is being used to generate true-jump targets for quotesets, so it cant be used for anything else in this script. This script can be used as the basis for many others which transform scripts in some way. For example, to 'pretty-print' scripts, or to generate compilable c code for a script using the functions in machine.methods.c. So, instead of compiling to the "assembler" format for the machine (which is then interpreted by the code in gh.c) we can compile to a series of c function calls. This is c source code which can be compiled with gcc, producing an executable version of the target script. This is an interesting idea, because we can transform a script into compilable or executable code in a different language with a different 'Machine' object. So, for example, we could write a Machine object in Ruby or Java or Python or x86 assembler and then generate compilable or executable code for that target environment. The compilable code would consist of a series of method calls for the given object and test and jumps. It will also be interesting to see if there is a significant performance advantage in running executed, rather than interpreted scripts. see compilable.c.pss for creating executable parse programs from scripts GRAMMAR NOTES The machine cannot directly implement the ebnf structures of repetition "{...}", optionality "[...]" or grouping "(...)", so we need to express all grammar rules only in terms of alternation |. Quotesets are a handy way to express this in a script, eg * bnf rule: alpha ::= a | b | c ; >> 'a','b','c' { clear; add "alpha*"; push; } It is usually straightforward to factor out the above ebnf structures. SEE ALSO At http://bumble.sf.net/books/pars/ object/gh.c the current implementation of the machine interpreter and debugger. object/*.c the virtual machine and components compilable.c.pss compiles a script to c code. asm.handcode.pp a handcoded "assembly" compiler of the parse script language. object/machine.methods.c a set of functions to perform instructions on the parse machine. USAGE This script can be used to replace the hand-coded assembler file "asm.handcode.pp" since it is much easier to maintain and add new syntax for the parse-script language. I would like to preserve comments in the output. We can also do the strange operation >> pp -f compile.pss compile.pss which actually creates an 'assembler' version of itself in 'sav.pp' which should then be suitable for use as an 'asm.pp' substitute. This is quite tricky to think about since it is so self-referential. But this is analogous to the equally strange operation >> pp -f compilable.c.pss compilable.c.pss which generates a compilable c language program of the compilable script. It may be possible to compile this script into a stand-alone executable with: (untested) ---- pp -f compilable.c.pss compile.pss > compile.c gcc -o compile.exec -Lobject -lmachine -Iobject ,,,, Also, it may be more useful for this script to print out its output rather than writing it to file. TESTING The script can be tested with >> pp -f compile.pss -i "[aeiou] {a '(vowel)'; } t;d;" and then inspect the file "sav.pp" or run >> pp -a sav.pp -i "abcde" output: a(vowel)bcde(vowel) * test "test chaining" compilation >> pp -f compile.pss -i "r;'a','b','c'{t;}t;d;" >> pp -a sav.pp -i "axbxcx" output should be: aaxbbxccx * view/debug how compile.pss compiles test chains (or something else) >> pp -If compile.pss -i "r;'a','b','c'{t;}t;d;" FIXED BUGS I was getting segmentation faults because of one-off errors etc >> pp -f compile.pss compile.pss Mainly fixed with "valgrind", but still a bug in "until" (in object/machine.interp.c execute()... need to implement endsWith() function. And one other bug. BUGS dont need 2 jumps after "tests", just 1 jumpfalse or jumptrue!! compile.pss should not write the compiled script to stdout because then asm.pp will do the same thing. easy enough to fix in asm.pp as well (comment out final 2 "print" commands). comments not parsing correctly. Comments and multiline comments should not jump back to read after deleting the comment, because there could be no more input, and read will throw an error. They should jump to the EOF end-of-file check. Or they could just call ".reparse" which is safe but not very efficient. HISTORY 29 august 2019 changed the way testeof and testtape are parsed to include them with other tests. This also allows to negate them with !(==) and !(eof) and also to concatenate with other tests eg: (eof),B"abc" {} added extra syntax and <==> for these tests. 25 august 2019 Realised that I dont need 2 jumps for OR test concatenation (with ',') That will greatly improve script interpretation efficiency. Added AND concatenation logic to tests so now we can do * test if workspace begins with 'a' AND ends with 'z' >> B"a".E"z" {} Changed the way .reparse and .restart are parsed and compiled. These are now parsed as 2 tokens ".*word*". This allows me to use '.' for AND logic concatenation in tests. It also allows me to provide special semantic meaning to commands beginning with a dot, which seems like a good thing. added "delim" command here and in machine.c and machine.interp.c, to change the stack delimiter. 24 august 2019 The "state*" token should be separated into "testeof*" and "testtape*" and then the 2 tests can be elided. The conversion to a "test*{*" rule and ellision of multiple tests will make this script much more compact and hopefully just as readable. Also, as a side effect, negation of all tests will be available soon. Also, it is possible to chain together different types of tests. Converted quoteset to "ortestset*" and "andtestset*". I will introduce a new notation namely: * check if workspace begins with "abc" AND ends with "xyz" >> B"abc" . E"xyz" { commands } so the dot will become an "AND" (&&) concatenator of tests and "," will remain as the "OR" (||) concatenator of tests In these || and && test lists any type of test can be included for example * check if workspace starts with "a", only contains chars a|b|c * and ends with the letter "z" (using "." AND concatenator) >> B"abc" . [abc] . E"z" { ... } experimenting with the new technique to create negated tokens classes. * test negated tokens for "testis" >> /usr/local/bin/pp -f compile.pss -i 'r;!"b",!"a"{nop;}' 23 august 2019 adding begintests and endtests to the quoteset logic. But need to juggle the combinations. Also could add classes and notclasses. more or less working. But should actually changing parsing to make quotesets more flexible, see the section of the script for details. The new quoteset compilation seems to be working. Needs more testing. We can now use compile.pss as a replacement for asm.pp. Converting to a new quoteset (eg: 'n','m' {...} ) lookahead compiling technique. Also we can compile comments with rules for "comment*command*" and "command*comment*" and "comment*comment*" -> "comment*". Instead of the current shenanigins. 14 august 2019 trying to preserve comments here but cant reduce comments with tokens like {* }* !* etc because we never retrieve the attributes for those tokens. more thought required. added a !"text" {...} syntax. very simple to add here. did the same in compilable.c.pss added a "begin" block to this (for start configurations of scripts). Also need to improve the compilation of quotesets tokens which produce nifty but very poor code. need 'tapereplace' command for this. 6 august 2019 would be handy to have multiline quotes. should be easy to include. In fact they probably should already work, dont know why not... 30 july 2019 Fixed the last character bug by putting the EOF test at the very end of the file. The translation is complete and the script appears to be working but no doubt will contain bugs. Initially translated from asm.pp. *# read; #-------------- [:space:] { clear; .reparse } #--------------- # We can ellide all these single character tests, because # the stack token is just the character itself with a * # Braces {} are used for blocks, ',' and '.' for concatenating # tests with OR or AND logic. 'B' and 'E' for begin and end # tests. "{", "}", ";", ",", ".", "!", "B", "E" { put; add "*"; push; .reparse } #--------------- # format: "text" "\"" { until "\""; put; clear; add "quote*"; push; .reparse } #--------------- # format: 'text', single quotes are converted to double quotes # but we must escape embedded double quotes. "'" { clear; until "'"; clip; escape '"'; put; clear; add "\""; get; add "\""; put; clear; add "quote*"; push; .reparse } #--------------- # formats: [:space:] [a-z] [abcd] [:alpha:] etc "[" { until "]"; put; clear; add "class*"; push; .reparse } #--------------- # formats: (eof) (==) etc. I may change this syntax to just # EOF and == "(" { clear; until ")"; clip; put; "eof","EOF" { clear; add "eof*"; push; .reparse } "==" { clear; add "tapetest*"; push; .reparse } add " << unknown test near line "; ll; add " of script.\n"; add " bracket () tests are \n"; add " (eof) test if end of stream reached. \n"; add " (==) test if workspace is same as current tape cell \n"; print; clear; quit; } #--------------- # multiline and single line comments, eg #... and #* ... *# "#" { clear; read; # calling .restart here is a bug, because the (eof) clause # will never be called and the script never written or # printed. "\n" { clear; .restart } # checking for multiline comments of the form "#* \n\n\n *#" # these are just ignored at the moment (deleted) "*" { until "*#"; E"*#" { # it appears that calling .restart is usually a bad # idea. safer to call .reparse clear; .restart } # make an unterminated multiline comment an error # to ease debugging of scripts. clear; add "unterminated multiline comment #* ... *# \n"; print; clear; quit; } # single line comments. some will get lost. put; clear; add "#"; get; until "\n"; clip; put; clear; add "comment*"; push; #clear; .restart clear; .reparse } #---------------------------------- # parse command words (and abbreviations) # legal characters for keywords (commands) ![abcdefghijklmnopqrstuvwxyzBEKGPRWS+-<>0^] { # error message about a misplaced character put; clear; add "!! Misplaced character '"; get; add "' in script near line "; ll; add " (character "; cc; add ") \n"; print; clear; bail; } # my testclass implementation cannot handle complex lists # eg [a-z+-] this is why I have to write out the whole alphabet while [abcdefghijklmnopqrstuvwxyzBEKGPRWS+-<>0^]; #---------------------------------- # KEYWORDS # here we can test for all the keywords (command words) and their # abbreviated one letter versions (eg: clip k, clop K etc). Then # we can print an error message and abort if the word is not a # legal keyword for the parse-edit language "a" { clear; add "add"; } "k" { clear; add "clip"; } "K" { clear; add "clop"; } "D" { clear; add "replace"; } "d" { clear; add "clear"; } "t" { clear; add "print"; } "p" { clear; add "pop"; } "P" { clear; add "push"; } "G" { clear; add "put"; } "g" { clear; add "get"; } "x" { clear; add "swap"; } ">" { clear; add "++"; } "<" { clear; add "--"; } "r" { clear; add "read"; } "R" { clear; add "until"; } "w" { clear; add "while"; } "W" { clear; add "whilenot"; } # we can probably omit tests and jumps since they are not # designed to be used in scripts (only assembled parse programs). #* "b" { clear; add "jump"; } "j" { clear; add "jumptrue"; } "J" { clear; add "jumpfalse"; } "=" { clear; add "testis"; } "?" { clear; add "testclass"; } "b" { clear; add "testbegins"; } "B" { clear; add "testends"; } "E" { clear; add "testeof"; } "*" { clear; add "testtape"; } *# "n" { clear; add "count"; } "+" { clear; add "a+"; } "-" { clear; add "a-"; } "0" { clear; add "zero"; } "c" { clear; add "cc"; } "l" { clear; add "ll"; } "^" { clear; add "escape"; } "v" { clear; add "unescape"; } "z" { clear; add "delim"; } "S" { clear; add "state"; } "q" { clear; add "quit"; } "Q" { clear; add "bail"; } "s" { clear; add "write"; } "o" { clear; add "nop"; } "rs" { clear; add "restart"; } "rp" { clear; add "reparse"; } # some extra syntax for testeof and testtape "" { put; clear; add "eof*"; push; .reparse } "<==>" { put; clear; add "tapetest*"; push; .reparse } "add","clip","clop","replace","clear","print", "pop","push","put","get","swap","++","--","read", "until","while","whilenot", "jump","jumptrue","jumpfalse", "testis","testclass","testbegins","testends", "testeof","testtape", "count","a+","a-","zero","cc","ll", "escape","unescape","delim","state","quit","bail", "write","nop","reparse","restart" { put; clear; add "word*"; push; .reparse } #* #------------ # the .restart command just jumps to the start: label # (which is usually followed by a "read" command) ".restart" { clear; add "jump start"; put; clear; add "command*"; push; .reparse } *# #------------ # the .reparse command and "parse label" is a simple way to # make sure that all shift-reductions occur. It should be used inside # a block test, so as not to create an infinite loop. #* ".reparse" { clear; add "jump parse"; put; clear; add "command*"; push; .reparse } *# "parse>" { clear; add "parse:"; put; clear; add "command*"; push; .reparse } # -------------------- # try to implement begin-blocks, which are only executed # once, at the beginning of the script (similar to awk's BEGIN {} rules) "begin" { put; add "*"; push; .reparse } add " << unknown command on line "; ll; add " (char "; cc; add ")"; add " of source file. \n"; print; clear; quit; # ---------------------------------- # PARSING PHASE: # the lexing phase finishes here, and below is the # parse/compile phase of the script. Here we pop tokens # off the stack and check for sequences of tokens eg word*semicolon* # If we find a valid series of tokens, we "shift-reduce" or "resolve" # the token series eg word*semicolon* --> command* # # At the same time, we manipulate (transform) the attributes on the # tape, as required. So Tape=|pop|;| becomes |\npop| where the # bars | indicate tape cells. (2 tapes cells are merged into 1). # # Each time the stack is reduced, the tape must also be reduced # parse> #------------------------------------- # 2 tokens #------------------------------------- pop; pop; # All of the below are currently errors, but may not # be in the future if we expand the syntax of the parse # language. Also consider: # begintext* endtext* quoteset* notclass*, !* ,* ;* B* E* # It is nice to trap the errors here because we can emit some # hopefully not-very-cryptic error messages with a line number. # Otherwise the script writer has to debug with # pp -a asm.pp scriptfile -I # "word*word*", "word*}*", "word*begintext*", "word*endtext*", "word*!*", "word*,*", "quote*word*", "quote*class*", "quote*state*", "quote*}*", "quote*begintext*", "quote*endtext*", "class*word*", "class*quote*", "class*class*", "class*state*", "class*}*", "class*begintext*", "class*endtext*", "class*!*", "notclass*word*", "notclass*quote*", "notclass*class*", "notclass*state*", "notclass*}*" { push; push; add "error near line "; ll; add " (char "; cc; add ")"; add " of script (missing semicolon?) \n"; print; clear; quit; } "{*;*", ";*;*", "}*;*" { push; push; add "error near line "; ll; add " (char "; cc; add ")"; add " of script: misplaced semi-colon? ; \n"; print; clear; quit; } ",*{*" { push; push; add "error near line "; ll; add " (char "; cc; add ")"; add " of script: extra comma in list? \n"; print; clear; quit; } "!*!*" { push; push; add "error near line "; ll; add " (char "; cc; add ")"; add " of script: double negation '!!' is not implemented \n"; add " and probably won't be, because what would be the point? \n"; print; clear; quit; } "!*{*" { push; push; add "error near line "; ll; add " (char "; cc; add ")"; add " of script: misplaced negation operator (!)? \n"; print; clear; quit; } "!*command*" { push; push; add "error near line "; ll; add " (at char "; cc; add ") \n"; add " The negation operator (!) cannot precede a command \n"; print; clear; quit; } ";*{*", "command*{*", "commandset*{*" { push; push; add "error near line "; ll; add " (char "; cc; add ")"; add " of script: no test for brace block? \n"; print; clear; quit; } "{*}*" { push; push; add "error near line "; ll; add " of script: empty braces {}. \n"; print; clear; quit; } "}*command*" { push; push; add "error near line "; ll; add " of script: extra closing brace '}' ?. \n"; print; clear; quit; } #------------ # the .restart command just jumps to the start: label # (which is usually followed by a "read" command) # but '.' is also the AND concatenator, which seems ambiguous, # but seems to work. ".*word*" { clear; ++; get; --; "restart" { clear; add "jump start"; put; clear; add "command*"; push; .reparse } "reparse" { clear; add "jump parse"; put; clear; add "command*"; push; .reparse } push; push; add "error near line "; ll; add " (char "; cc; add ")"; add " of script: \n"; add " misplaced dot '.' (use for AND logic or in .reparse/.restart \n"; print; clear; quit; } #----------------------------------------- # compiling comments so as to transfer them to the compiled # file. Improve this by just forming rules for # command*comment* and comment*command* and comment*comment* # implement these rules to conserve comments "comment*command*" { nop; } "command*comment*" { nop; } "comment*comment*" { nop; } E"comment*" { # just leave the other token on the stack, whatever it is push; # avoid an infinite loop if only one token on the stack "" { .restart } clear; --; get; ++; add "\n"; get; --; put; ++; clear; .reparse } #----------------------------------------- # There is a problem, that attaching comments to { or } or # other trivial tokens makes them disappear because we # dont retrieve the attribute for those tokens. B"comment*" { # just leave the other token on the stack, whatever it is push; # avoid an infinite loop if only one token on the stack "" { .restart } # some tricky juggling of the unknown token # get rid of comment* token but conserve other one. ++; put; pop; clear; ++; get; push; clear; --; --; --; get; ++; add "\n"; get; --; put; ++; clear; .reparse } # ----------------------- # negated tokens. # # This is a new more elegant way to negate a whole set of # tests (tokens) where the negation logic is stored on the # stack, not in the current tape cell. We just add "not" to # the stack token. # eg: ![:alpha:] ![a-z] ![abcd] !"abc" !B"abc" !E"xyz" # This format is used to indicate a negative class test for # a brace block. eg: ![aeiou] { add "< not a vowel"; print; clear; } "!*quote*","!*class*","!*begintext*", "!*endtext*", "!*eof*","!*tapetest*" { # save the second token (quote/class/begintext) on the # tape (but not the current tape cell which has other stuff in # it push; ++; put; --; clear; pop; clear; add "not"; # extract the saved token name from the tape ++; ++; get; --; --; # now we have a token "notquote*" / "notclass*" / "notbegintext* push; get; --; put; ++; clear; .reparse } #----------------------------------------- # format: E"text" or E'text' # This format is used to indicate a "workspace-ends-with" text before # a brace block. "E*quote*" { clear; add "endtext*"; push; get; --; put; ++; clear; .reparse } #----------------------------------------- # format: B"sometext" or B'sometext' # A 'B' preceding some quoted text is used to indicate a # 'workspace-begins-with' test, before a brace block. "B*quote*" { clear; add "begintext*"; push; get; --; put; ++; clear; .reparse } #-------------------------------------------- # ebnf: command := word, ';' ; # formats: "pop; push; clear; print; " etc # all commands need to end with a semi-colon except for # .reparse and .restart # "word*;*" { clear; # check if command requires parameter get; "add", "until", "while", "whilenot", "escape", "unescape", "delim", "replace" { put; clear; add "'"; get; add "'"; add " << command needs an argument, on line "; ll; add " of script.\n"; print; clear; quit; } clear; add "command*"; # no need to format tape cells because current cell contains word push; .reparse } #----------------------------------------- # ebnf: commandset := command , command ; "command*command*", "commandset*command*" { clear; add "commandset*"; push; # format the tape attributes. Add the next command on a newline --; get; add "\n"; ++; get; --; put; ++; clear; .reparse } #------------------- # here we begin to parse "test*" and "ortestset*" and "andtestset*" # #------------------- # eg: B"abc" {} or E"xyz" {} "begintext*{*","endtext*{*","quote*{*","class*{*", "eof*{*","tapetest*{*" { # set accumulator == 0 zero; B"begin" { clear; add "testbegins "; } B"end" { clear; add "testends "; } B"quote" { clear; add "testis "; } B"class" { clear; add "testclass "; } B"eof" { clear; add "testeof "; } B"tapetest" { clear; add "testtape "; } get; add "\n"; add "jumptrue 2 \n"; # this extra jump has utility when we parse ortestsets and # andtestsets. add "jump block.end."; # the final jumpfalse + target will be added when # "test*{*commandset*}*" is parsed, or when # "ortestset*{*commandset*}*" # "andtestset*{*commandset*}*" put; a+; a+; a+; a+; clear; add "test*{*"; push; push; .reparse } #------------------- # negated tests # eg: !B"xyz {} # !E"xyz" {} # !"abc" {} # ![a-z] {} "notbegintext*{*","notendtext*{*","notquote*{*","notclass*{*", "noteof*{*","nottapetest*{*" { # set accumulator == 0 zero; B"notbegin" { clear; add "testbegins "; } B"notend" { clear; add "testends "; } B"notquote" { clear; add "testis "; } B"notclass" { clear; add "testclass "; } B"noteof" { clear; add "testeof "; } B"nottapetest" { clear; add "testtape "; } get; add "\n"; add "jumpfalse 2 \n"; # this extra jump has utility when we parse ortestsets and # andtestsets. add "jump block.end."; # the final jumpfalse + target will be added later put; a+; a+; a+; a+; clear; add "test*{*"; push; push; .reparse } #------------------- # 3 tokens #------------------- pop; #----------------------------- # some errors!!! # there are many other of these errors but I am not going # to write them all. "{*quote*;*", "{*begintext*;*", "{*endtext*;*", "{*class*;*" { push; push; push; add "error near line "; ll; add " (char "; cc; add ")"; add " of script (misplaced semicolon?) \n"; print; clear; quit; } # to simplify subsequent tests, transmogrify a single command # to a commandset (multiple commands). "{*command*}*" { clear; add "{*commandset*}*"; push; push; push; .reparse } # rule #',' ortestset ::= ',' test '{' # trigger a transmogrification from test to ortestset token # and # '.' andtestset ::= '.' test '{' ",*test*{*" { clear; add ",*ortestset*{*"; push; push; push; .reparse } # trigger a transmogrification from "test" to "andtest" by # looking backwards in the stack ".*test*{*" { # the jump counter is 1 too high for AND tests a-; clear; add ".*andtestset*{*"; push; push; push; .reparse } # errors! mixing AND and OR concatenation ",*andtestset*{*", ".*ortestset*{*" { # push the tokens back to make debugging easier push; push; push; add " error: mixing AND (.) and OR (,) concatenation in \n"; add " in script near line "; ll; add " (character "; cc; add ") \n"; print; clear; quit; } #-------------------------------------------- # ebnf: command := keyword , quoted-text , ";" ; # format: add "text"; "word*quote*;*" { clear; get; "replace" { # error add "< command requires 2 parameters, not 1 \n"; add "near line "; ll; add " of script. \n"; print; clear; quit; } "add", "until", "while", "whilenot", "escape", "unescape", "delim" { clear; add "command*"; push; # a command plus argument, eg add "this" --; get; add " "; ++; get; --; put; ++; clear; .reparse } # error, superfluous argument add ": command does not take an argument \n"; add "near line "; ll; add " of script. \n"; print; clear; #state quit; } #---------------------------------- # format: "while [:alpha:] ;" or whilenot [a-z] "word*class*;*" { clear; get; "while", "whilenot" { clear; add "command*"; push; # a command plus argument, eg while [a-z] --; get; add " "; ++; get; --; put; ++; clear; .reparse } # error add " < command cannot have a class argument \n"; add "line "; ll; add ": error in script \n"; print; clear; quit; } # ------------------------------- # 4 tokens # ------------------------------- pop; #------------------------------------- # ebnf: command := replace , quote , quote , ";" ; # example: replace "and" "AND" ; "word*quote*quote*;*" { clear; get; "replace" { clear; add "command*"; push; #--------------------------- # a command plus 2 arguments, eg replace "this" "that" --; get; add " "; ++; get; add " "; ++; get; --; --; put; ++; clear; .reparse } add " << command does not take 2 quoted arguments. \n"; add " on line "; ll; add " of script.\n"; quit; } #------------------------------------- # format: begin { #* commands *# } # "begin" blocks which are only executed once (they # will are assembled before the "start:" label. They must come before # all other commands. # "begin*{*command*}*", "begin*{*commandset*}*" { clear; ++; ++; get; --; --; put; clear; add "beginblock*"; push; .reparse } # ------------- # parses and compiles concatenated tests # eg: 'a',B'b',E'c',[def],[:space:],[g-k] { ... "begintext*,*ortestset*{*", "endtext*,*ortestset*{*", "quote*,*ortestset*{*", "class*,*ortestset*{*" { B"begin" { clear; add "testbegins "; } B"end" { clear; add "testends "; } B"quote" { clear; add "testis "; } B"class" { clear; add "testclass "; } B"eof" { clear; add "testeof "; } B"tapetest" { clear; add "testtape "; } get; add "\n"; add "jumptrue "; count; add "\n"; ++; ++; get; --; --; put; clear; # this works as long as we dont mix AND and OR concatenations # add "test*{*"; # need to change to this add "ortestset*{*"; push; push; a+; a+; .reparse } # A collection of negated tests. "notbegintext*,*ortestset*{*", "notendtext*,*ortestset*{*", "notquote*,*ortestset*{*", "notclass*,*ortestset*{*" { B"notbegin" { clear; add "testbegins "; } B"notend" { clear; add "testends "; } B"notquote" { clear; add "testis "; } B"notclass" { clear; add "testclass "; } B"noteof" { clear; add "testeof "; } B"nottapetest" { clear; add "testtape "; } get; add "\n"; add "jumpfalse "; count; add "\n"; ++; ++; get; --; --; put; clear; # this works as long as we dont mix AND and OR concatenations add "ortestset*{*"; # need to change to this # add "ortestset*{*"; push; push; a+; a+; .reparse } # this works as long as we dont mix AND and OR concatenations # ------------- # AND logic # parses and compiles concatenated AND tests # eg: 'a',B'b',E'c',[def],[:space:],[g-k] { ... # it is possible to elide this block with the negated block # for compactness but maybe readability is not as good. "begintext*.*andtestset*{*", "endtext*.*andtestset*{*", "quote*.*andtestset*{*", "class*.*andtestset*{*" { B"begin" { clear; add "testbegins "; } B"end" { clear; add "testends "; } B"quote" { clear; add "testis "; } B"class" { clear; add "testclass "; } B"eof" { clear; add "testeof "; } B"tapetest" { clear; add "testtape "; } get; add "\n"; add "jumpfalse "; count; add "\n"; ++; ++; get; --; --; put; clear; add "andtestset*{*"; push; push; a+; a+; .reparse } # eg # negated tests concatenated with AND logic (.). The # negated tests can be chained with non negated tests. # eg: B'http' . !E'.txt' { ... } "notbegintext*.*andtestset*{*", "notendtext*.*andtestset*{*", "notquote*.*andtestset*{*", "notclass*.*andtestset*{*" { B"notbegin" { clear; add "testbegins "; } B"notend" { clear; add "testends "; } B"notquote" { clear; add "testis "; } B"notclass" { clear; add "testclass "; } B"noteof" { clear; add "testeof "; } B"nottapetest" { clear; add "testtape "; } get; add "\n"; add "jumptrue "; count; add "\n"; ++; ++; get; --; --; put; clear; add "andtestset*{*"; push; push; a+; a+; .reparse } #------------------------------------- # we should not have to check for the {*command*}* pattern # because that has already been transformed to {*commandset*}* "test*{*commandset*}*", "andtestset*{*commandset*}*", "ortestset*{*commandset*}*" { # indent the assembled code for readability B"test*{*" { clear; # get rid of unnecessary jump but only in "test" cases get; # for positive tests (eg [a-z] {...}) replace "jumptrue 2 \njump" "jumpfalse"; put; # for negative tests (eg ![a-z] {...}) replace "jumpfalse 2 \njump" "jumptrue"; put; } clear; ++; ++; add " "; get; replace "\n" "\n "; put; --; --; clear; get; # the final jump (to the closing brace) has already been # coded in the "test*{*" rule or the other rules. # we just need to add the label number with "cc" cc; add "\n"; ++; ++; get; add "\nblock.end."; cc; add ":"; --; --; put; clear; add "command*"; push; # always reparse/compile .reparse } #------------------------------------- # format: (eof) { add ":"; print; clear; } # format: (==) { print; } # rewrite to separate into 2 separate test tokens # "testeof*" and "testtape*" #* # now parsing this with all the other tests. "state*{*commandset*}*", "state*{*command*}*" { clear; # indent for readability ++; ++; add " "; get; replace "\n" "\n "; put; --; --; clear; get; "eof" { clear; add "testeof "; add "\njumpfalse not.eof."; cc; add "\n"; ++; ++; get; add "\nnot.eof."; cc; --; --; add ":"; put; clear; add "command*"; push; .reparse } # ---------- # compile to tape= test "==" { clear; add "testtape "; add "\njumpfalse not.testtape."; cc; add "\n"; ++; ++; get; add "\nnot.testtape."; cc; --; --; add ":"; put; clear; add "command*"; push; .reparse } add ": unknown state test near line "; ll; add " of script.\n"; add " State tests are \n"; add " (eof) test if end of stream reached. \n"; add " (==) test if workspace is same as current tape cell \n"; print; clear; quit; } *# # put the 4 (or less) tokens back on the stack push; push; push; push; (eof) { #add "end of script!! \n" print; clear; #--------------------- # check if the script correctly parsed (there should only # be one token on the stack, namely "commandset*" or "command*" pop; pop; "commandset*", "command*" { push; --; add "# Assembled with the script 'compile.pss' \n"; add "start:\n"; get; # an extra space because of a bug in compile() add "\njump start \n"; # put a copy of the final compilation into the tapecell # so it can be inspected interactively. put; print; # save the compiled script to 'sav.pp' write; clear; quit; } "beginblock*commandset*", "beginblock*command*" { clear; add "# Assembled with the script 'compile.pss' \n"; get; add "\n"; ++; add "start:\n"; get; # an extra space because of a bug in compile() add "\njump start \n"; # put a copy of the final compilation into the tapecell # so it can be inspected interactively. put; print; # also save the compiled script to 'sav.pp' write; clear; quit; } push; push; # state clear; add "After compiling with 'compile.pss' (at EOF): \n "; add " parse error in input script, check syntax or try \n "; add " 'pp -Ia asm.pp script' to debug compilation \n "; print; clear; # clear sav.pp because script could not be compiled write; # bail means exit with error bail; } # not eof # there is an implicit .restart command here (jump start)