# Assembled with the script 'compile.pss' start: # # translate.java.pss # # This is a parse-script which translates parse-scripts into java # code, using the 'pep' tool. The script creates a standalone # compilable java program. # # The virtual machine and engine is implemented in plain c at # http://bumble.sf.net/books/pars/gh.c. This implements a script language # with a syntax reminiscent of sed and awk (much simpler than awk, but # more complex than sed). # # This code was originally created in a straightforward manner by adapting # the code in 'compile.js.pss' which compiles scripts to javascript # #NOTES # # We use labelled loops and break/continue to implement the # parse> label and .reparse .restart commands. Breaks are also # used to implement the quit and bail commands. # #TODO # # Convert the parsing code to a method which takes an input # stream as a parameter. This way the same parser/compiler # can be used with a string/file/stdin etc and can also be # used by other classes/objects. # #SEE ALSO # # At http://bumble.sf.net/books/pars/ # # compile.tcl.pss # A very similar script for compiling scripts into tcl # # compile.py.pss # A script translator for python. # # compile.pss # compiles a script into an "assembly" format that can be loaded # and run on the parse-machine with the -a switch. This performs # the same function as "asm.pp" # #TESTING # # * testing the multiple escaped until bug # >> pep.jas 'r;until"c";add".";t;d;' 'ab\\cab\cabc' # # Complex scripts can be translated into java and work, # including this script itself. # # eg/natural.language.pss seems to translate well to java. # # * the following is working! # ---- # pep -f compile.java.pss eg/mark.html.pss > Machine.java # javac Machine.java # cat pars-book.txt | java Machine # ,,,, # # The following is working!: # ----- # pep -f compile.java.pss compile.java.pss > Machine.java # javac Machine.java # cat eg/exp.tolisp.pss | java Machine > Machine.java # javac Machine.java # echo "(a+2)*3+4" | java Machine # ,,, # # This is fairly complex. The script translates itself into # java, and then that translator is used to translate # another script into java, which is then executed.... # # But even more complex stuff is also working, such as # self referentiality cubed!!! # ----- # pep -f compile.java.pss compile.java.pss > Machine.java # javac Machine.java # cat compile.java.pss | java Machine > Machine.java # javac Machine.java # cat eg/exp.tolisp.pss | java Machine > Machine.java # javac Machine.java # echo "(a+2)*3+4" | java Machine # ,,, # # The script can be tested with something like # ---- # pep -f compile.java.pss -i "r;[aeiou]{a '=vowel\n';t;}d;" > Machine.java # javac Machine.java; # echo "abcdefhijklmnop" | java Machine # ,,, # # The output will be java code which is equivalent to the # script provided to the -i switch. # # * a very comprehensive test is to run it on itself # >> pep -f compile.java.pss compile.java.pss > Machine.java # # This is the "shangrilah" of pep scripts. # # And then we could do!! # >> cat compile.java.pss | java Machine # which is self-referentiality squared, but I am not sure what # its use is. # #GOTCHAS # # I was trying to run # >> pep -e "r;a'\\';print;d;" -i "abc" # and I kept getting an unterminated quote message, which I thought I # had fixed in machine.interp.c (until code). But the problem was actually # the bash shell which resolves \\ to \ in double quotes, but not single quotes! # #BUGS # # Xdigit not valid class. # # Its a bit strange to talk about a multicharacter string being "escaped" # (eg when calling 'until') but this is allowed in the pep engine. # # add "\{"; will generate an "illegal escape character" error # when trying to compile the generated java code. I need to # consider what to do in this situation (eg escape \ to \\ ?) # # check "go/mark" code. what happens if the mark is not found?? # throw error and exit I think. # #SOLVED BUGS # # found a bug in "replace" code, which was returning from inline code. # # Found and fixed a bug in the (==) code ie in java (stringa == stringb) # doesnt work. # # found and fixed a bug in java whilenot/while. The code exits if the # character is not found, which is not correct. # # delimiter was hardcoded in push # solved an "until" bug where the java code did not read # at least one character. # #TASKS # #HISTORY # # 17 june 2022 # converted the tape and marks arrays to ArrayList so that # they can grow dynamically. # 15 july 2021 # probably fixed the multiple escape char "until" bug with # the countEscaped() function. # # 10 july 2021 # Trying to fix the 'until' code so that we can write 'add "x \\\\";' # or add "x\\\"x"; fixed in compile.pss, now fix in translate.java.pss # # 30 july 2020 # Found "bug", that a begin block with no other code # is not allowed as a script. # # 29 july 2020 # found a bug in "go" (not getting text from tape). # Also, delimiter was hard-coded in "push" # Found a bug in "clop" (a return statement) # # 25 july 2020 # # The translation of eg/mark.html.pss is now working. That # means that many complex scripts now work with this script. # # Found another bug in the matches code. classes must match # the whole string, not just one character, so they need to # be eg: "^[a-z]+$" not just "[a-z]" # # Found a bug in the code for "tapetest*" and "nottapetest*" # ie (==) and !(==). I was using the wrong equals operator for # java. I found this bug by using a new vim command on # code in the pars-book.txt. This command translates to java and # then compiles and runs pep fragments. This is a useful debugging # technique. # # 24 july 2020 # # Very great advances today. See the testing heading for # a strange but true self compilation example. # # The script successfully translates itself into java!! # So the following works # ----- # pep -f compile.java.pss compile.java.pss > Machine.java # javac Machine.java # echo "nop;r;t;t;d;" | java Machine # ,,,, # # This script translates eg/json.parse.pss into a seemingly # correct java program. It translates eg/mark.html.pss into # compilable java code, but doesnt transform the text to # html correctly. # # completely changed the way andtestset* and ortestset* tokens # are parsed. This has greatly simplified the logic. # First tests show that the script is working, although there will # be bugs. # # 23 july 2020 # # Extensive revision of this script. rewriting methods as "inline". # But revision is incomplete. This script should become # a good template for writing similar scripts in other languages. # # 22 july 2020 # # Changed the stack code to use the java.util.Stack class. In the process of # rethinking this script and reforming it. I will include the Machine class # within the output of the script, so that there are no dependencies on # external code. . Also, I will remove trivial methods from the class. # # Oct 2019 # Made functions ppjjs, ppjjss, ppjjf in helpers.pars.sh so that java # scripts can be easily run. # # 30 sept 2019 # basic scripts working. whilenotPeep and whilePeep need to # be written properly. Also, translate unicode categories in # [:text:] format to java regex. # # 27 sept 2019 # Began to adapt this script from compile.javascript.pss # # read #-------------- testclass [:space:] jumpfalse block.end.7500 testclass [\n] jumpfalse block.end.7456 nochars block.end.7456: clear testeof jumptrue block.end.7487 jump start block.end.7487: jump parse block.end.7500: #--------------- # We can ellide all these single character tests, because # the stack token is just the character itself with a * # Braces {} are used for blocks of commands, ',' and '.' for concatenating # tests with OR or AND logic. 'B' and 'E' for begin and end # tests, '!' is used for negation, ';' is used to terminate a # command. testis "{" jumptrue 16 testis "}" jumptrue 14 testis ";" jumptrue 12 testis "," jumptrue 10 testis "." jumptrue 8 testis "!" jumptrue 6 testis "B" jumptrue 4 testis "E" jumptrue 2 jump block.end.7936 put add "*" push jump parse block.end.7936: #--------------- # format: "text" testis "\"" jumpfalse block.end.8389 # save the start line number (for error messages) in case # there is no terminating quote character. clear add "line " ll add " (character " cc add ") " put clear add "\"" until "\"" testends "\"" jumptrue block.end.8331 clear add "Unterminated quote character (\") starting at " get add " !\n" print quit block.end.8331: put clear add "quote*" push jump parse block.end.8389: #--------------- # format: 'text', single quotes are converted to double quotes # but we must escape embedded double quotes. testis "'" jumpfalse block.end.8970 # save the start line number (for error messages) in case # there is no terminating quote character. clear add "line " ll add " (character " cc add ") " put clear until "'" testends "'" jumptrue block.end.8853 clear add "Unterminated quote (') starting at " get add "!\n" print quit block.end.8853: clip escape "\"" put clear add "\"" get add "\"" put clear add "quote*" push jump parse block.end.8970: #--------------- # formats: [:space:] [a-z] [abcd] [:alpha:] etc # should class tests really be multiline??! testis "[" jumpfalse block.end.12731 # save the start line number (for error messages) in case # there is no terminating bracket character. clear add "line " ll add " (character " cc add ") " put clear add "[" until "]" testis "[]" jumpfalse block.end.9494 clear add "pep script error at line " ll add " (character " cc add "): \n" add " empty character class [] \n" print quit block.end.9494: testends "]" jumptrue block.end.9781 clear add "Unterminated class text ([...]) starting at " get add "\n" add " class text can be used in tests or with the 'while' and \n" add " 'whilenot' commands. For example: \n" add " [:alpha:] { while [:alpha:]; print; clear; }\n" add " " print quit block.end.9781: # need to escape quotes so they dont interfere with the # quotes java needs for .matches("...") escape "\"" # the caret is not a negation operator in pep scripts replace "^" "\\\\^" # save the class on the tape put clop clop testbegins "-" jumptrue block.end.10208 # not a range class, eg [a-z] so need to escape '-' chars # java requires a double escape clear get replace "-" "\\\\-" put block.end.10208: testbegins "-" jumpfalse block.end.10596 # a range class, eg [a-z], check if it is correct clip clip testis "-" jumptrue block.end.10590 clear add "Error in pep script at line " ll add " (character " cc add "): \n" add " Incorrect character range class " get add "\n" add " For example:\n" add " [a-g] # correct\n" add " [f-gh] # error! \n" print clear quit block.end.10590: block.end.10596: clear get # restore class text testbegins "[:" jumpfalse 3 testends ":]" jumpfalse 2 jump block.end.10761 clear add "malformed character class starting at " get add "!\n" print quit block.end.10761: testbegins "[:" jumpfalse 3 testis "[:]" jumpfalse 2 jump block.end.11871 clip clip clop clop # unicode posix character classes in java # Also, abbreviations (not implemented in gh.c yet.) testis "alnum" jumptrue 4 testis "N" jumptrue 2 jump block.end.10967 clear add "\\\\p{Alnum}" block.end.10967: testis "alpha" jumptrue 4 testis "A" jumptrue 2 jump block.end.11016 clear add "\\\\p{Alpha}" block.end.11016: testis "ascii" jumptrue 4 testis "I" jumptrue 2 jump block.end.11065 clear add "\\\\p{ASCII}" block.end.11065: testis "blank" jumptrue 4 testis "B" jumptrue 2 jump block.end.11114 clear add "\\\\p{Blank}" block.end.11114: testis "cntrl" jumptrue 4 testis "C" jumptrue 2 jump block.end.11163 clear add "\\\\p{Cntrl}" block.end.11163: testis "digit" jumptrue 4 testis "D" jumptrue 2 jump block.end.11212 clear add "\\\\p{Digit}" block.end.11212: testis "graph" jumptrue 4 testis "G" jumptrue 2 jump block.end.11261 clear add "\\\\p{Graph}" block.end.11261: # or equiv to graph [^\p{Z}\p{C}] as suggested on stack overflow testis "lower" jumptrue 4 testis "L" jumptrue 2 jump block.end.11381 clear add "\\\\p{Lower}" block.end.11381: testis "print" jumptrue 4 testis "P" jumptrue 2 jump block.end.11430 clear add "\\\\p{Print}" block.end.11430: testis "punct" jumptrue 4 testis "T" jumptrue 2 jump block.end.11479 clear add "\\\\p{Punct}" block.end.11479: testis "space" jumptrue 4 testis "S" jumptrue 2 jump block.end.11528 clear add "\\\\p{Space}" block.end.11528: testis "upper" jumptrue 4 testis "U" jumptrue 2 jump block.end.11577 clear add "\\\\p{Upper}" block.end.11577: testis "xdigit" jumptrue 4 testis "X" jumptrue 2 jump block.end.11628 clear add "\\\\p{XDigit}" block.end.11628: testbegins "\\\\p{" jumptrue block.end.11865 put clear add "Pep script syntax error near line " ll add " (character " cc add "): \n" add "Unknown character class '" get add "'\n" print clear quit block.end.11865: block.end.11871: # alnum - alphanumeric like [0-9a-zA-Z] # alpha - alphabetic like [a-zA-Z] # blank - blank chars, space and tab # cntrl - control chars, ascii 000 to 037 and 177 (del) # digit - digits 0-9 # graph - graphical chars same as :alnum: and :punct: # lower - lower case letters [a-z] # print - printable chars ie :graph: + space # punct - punctuation ie !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~. # space - all whitespace, eg \n\r\t vert tab, space, \f # upper - upper case letters [A-Z] # xdigit - hexadecimal digit ie [0-9a-fA-F] # put clear # add quotes around the class and limits around the # class so it can be used with the string.matches() method # (must match the whole string, not just one character) add "\"^" get add "+$\"" put clear add "class*" push jump parse block.end.12731: #--------------- # formats: (eof) (EOF) (==) etc. testis "(" jumpfalse block.end.13202 clear until ")" clip put testis "eof" jumptrue 4 testis "EOF" jumptrue 2 jump block.end.12885 clear add "eof*" push jump parse block.end.12885: testis "==" jumpfalse block.end.12938 clear add "tapetest*" push jump parse block.end.12938: add " << unknown test near line " ll add " of script.\n" add " bracket () tests are \n" add " (eof) test if end of stream reached. \n" add " (==) test if workspace is same as current tape cell \n" print clear quit block.end.13202: #--------------- # multiline and single line comments, eg #... and #* ... *# testis "#" jumpfalse block.end.14343 clear read testis "\n" jumpfalse block.end.13338 clear jump parse block.end.13338: # checking for multiline comments of the form "#* \n\n\n *#" # these are just ignored at the moment (deleted) testis "*" jumpfalse block.end.14188 # save the line number for possible error message later clear ll put clear until "*#" testends "*#" jumpfalse block.end.13933 # convert to /* ... */ java multiline comment clip clip put clear add "/*" get add "*/" # create a "comment" parse token put clear # comment-out this line to remove multiline comments from the # compiled java. # add "comment*"; push; jump parse block.end.13933: # make an unterminated multiline comment an error # to ease debugging of scripts. clear add "unterminated multiline comment #* ... *# \n" add "stating at line number " get add "\n" print clear quit block.end.14188: # single line comments. some will get lost. put clear add "//" get until "\n" clip put clear add "comment*" push jump parse block.end.14343: #---------------------------------- # parse command words (and abbreviations) # legal characters for keywords (commands) testclass [abcdefghijklmnopqrstuvwxyzBEKGPRUWS+-<>0^] jumptrue block.end.14730 # error message about a misplaced character put clear add "!! Misplaced character '" get add "' in script near line " ll add " (character " cc add ") \n" print clear quit block.end.14730: # my testclass implementation cannot handle complex lists # eg [a-z+-] this is why I have to write out the whole alphabet while [abcdefghijklmnopqrstuvwxyzBEOFKGPRUWS+-<>0^] #---------------------------------- # KEYWORDS # here we can test for all the keywords (command words) and their # abbreviated one letter versions (eg: clip k, clop K etc). Then # we can print an error message and abort if the word is not a # legal keyword for the parse-edit language # make ll an alias for "lines" and cc an alias for chars testis "ll" jumpfalse block.end.15314 clear add "lines" block.end.15314: testis "cc" jumpfalse block.end.15346 clear add "chars" block.end.15346: # one letter command abbreviations testis "a" jumpfalse block.end.15413 clear add "add" block.end.15413: testis "k" jumpfalse block.end.15443 clear add "clip" block.end.15443: testis "K" jumpfalse block.end.15473 clear add "clop" block.end.15473: testis "D" jumpfalse block.end.15506 clear add "replace" block.end.15506: testis "d" jumpfalse block.end.15537 clear add "clear" block.end.15537: testis "t" jumpfalse block.end.15568 clear add "print" block.end.15568: testis "p" jumpfalse block.end.15597 clear add "pop" block.end.15597: testis "P" jumpfalse block.end.15627 clear add "push" block.end.15627: testis "u" jumpfalse block.end.15660 clear add "unstack" block.end.15660: testis "U" jumpfalse block.end.15691 clear add "stack" block.end.15691: testis "G" jumpfalse block.end.15720 clear add "put" block.end.15720: testis "g" jumpfalse block.end.15749 clear add "get" block.end.15749: testis "x" jumpfalse block.end.15779 clear add "swap" block.end.15779: testis ">" jumpfalse block.end.15807 clear add "++" block.end.15807: testis "<" jumpfalse block.end.15835 clear add "--" block.end.15835: testis "m" jumpfalse block.end.15865 clear add "mark" block.end.15865: testis "M" jumpfalse block.end.15893 clear add "go" block.end.15893: testis "r" jumpfalse block.end.15923 clear add "read" block.end.15923: testis "R" jumpfalse block.end.15954 clear add "until" block.end.15954: testis "w" jumpfalse block.end.15985 clear add "while" block.end.15985: testis "W" jumpfalse block.end.16019 clear add "whilenot" block.end.16019: testis "n" jumpfalse block.end.16050 clear add "count" block.end.16050: testis "+" jumpfalse block.end.16078 clear add "a+" block.end.16078: testis "-" jumpfalse block.end.16106 clear add "a-" block.end.16106: testis "0" jumpfalse block.end.16136 clear add "zero" block.end.16136: testis "c" jumpfalse block.end.16167 clear add "chars" block.end.16167: testis "l" jumpfalse block.end.16198 clear add "lines" block.end.16198: testis "^" jumpfalse block.end.16230 clear add "escape" block.end.16230: testis "v" jumpfalse block.end.16264 clear add "unescape" block.end.16264: testis "z" jumpfalse block.end.16295 clear add "delim" block.end.16295: testis "S" jumpfalse block.end.16326 clear add "state" block.end.16326: testis "q" jumpfalse block.end.16356 clear add "quit" block.end.16356: testis "s" jumpfalse block.end.16387 clear add "write" block.end.16387: testis "o" jumpfalse block.end.16416 clear add "nop" block.end.16416: testis "rs" jumpfalse block.end.16450 clear add "restart" block.end.16450: testis "rp" jumpfalse block.end.16484 clear add "reparse" block.end.16484: # some extra syntax for testeof and testtape testis "" jumptrue 4 testis "" jumptrue 2 jump block.end.16595 put clear add "eof*" push jump parse block.end.16595: testis "<==>" jumpfalse block.end.16653 put clear add "tapetest*" push jump parse block.end.16653: testis "jump" jumptrue 18 testis "jumptrue" jumptrue 16 testis "jumpfalse" jumptrue 14 testis "testis" jumptrue 12 testis "testclass" jumptrue 10 testis "testbegins" jumptrue 8 testis "testends" jumptrue 6 testis "testeof" jumptrue 4 testis "testtape" jumptrue 2 jump block.end.16981 put clear add "The instruction '" get add "' near line " ll add " (character " cc add ")\n" add "can be used in pep assembly code but not scripts. \n" print clear quit block.end.16981: # show information if these "deprecated" commands are used testis "Q" jumptrue 6 testis "bail" jumptrue 4 testis "state" jumptrue 2 jump block.end.17396 put clear add "The instruction '" get add "' near line " ll add " (character " cc add ")\n" add "is no longer part of the pep language (july 2020). \n" add "use 'quit' instead of 'bail', and use 'unstack; print;' \n" add "instead of 'state'. \n" print clear quit block.end.17396: testis "add" jumptrue 80 testis "clip" jumptrue 78 testis "clop" jumptrue 76 testis "replace" jumptrue 74 testis "upper" jumptrue 72 testis "lower" jumptrue 70 testis "cap" jumptrue 68 testis "clear" jumptrue 66 testis "print" jumptrue 64 testis "pop" jumptrue 62 testis "push" jumptrue 60 testis "unstack" jumptrue 58 testis "stack" jumptrue 56 testis "put" jumptrue 54 testis "get" jumptrue 52 testis "swap" jumptrue 50 testis "++" jumptrue 48 testis "--" jumptrue 46 testis "mark" jumptrue 44 testis "go" jumptrue 42 testis "read" jumptrue 40 testis "until" jumptrue 38 testis "while" jumptrue 36 testis "whilenot" jumptrue 34 testis "count" jumptrue 32 testis "a+" jumptrue 30 testis "a-" jumptrue 28 testis "zero" jumptrue 26 testis "chars" jumptrue 24 testis "lines" jumptrue 22 testis "nochars" jumptrue 20 testis "nolines" jumptrue 18 testis "escape" jumptrue 16 testis "unescape" jumptrue 14 testis "delim" jumptrue 12 testis "quit" jumptrue 10 testis "write" jumptrue 8 testis "nop" jumptrue 6 testis "reparse" jumptrue 4 testis "restart" jumptrue 2 jump block.end.17789 put clear add "word*" push jump parse block.end.17789: #------------ # the .reparse command and "parse label" is a simple way to # make sure that all shift-reductions occur. It should be used inside # a block test, so as not to create an infinite loop. There is # no "goto" in java so we need to use labelled loops to # implement .reparse/parse> testis "parse>" jumpfalse block.end.18437 clear count testis "0" jumptrue block.end.18292 clear add "script error:\n" add " extra parse> label at line " ll add ".\n" print quit block.end.18292: clear add "// parse>" put clear add "parse>*" push # use accumulator to indicate after parse> label a+ jump parse block.end.18437: # -------------------- # implement "begin-blocks", which are only executed # once, at the beginning of the script (similar to awk's BEGIN {} rules) testis "begin" jumpfalse block.end.18648 put add "*" push jump parse block.end.18648: add " << unknown command on line " ll add " (char " cc add ")" add " of source file. \n" print clear quit # ---------------------------------- # PARSING PHASE: # Below is the parse/compile phase of the script. Here we pop tokens off the # stack and check for sequences of tokens eg "word*semicolon*". If we find a # valid series of tokens, we "shift-reduce" or "resolve" the token series eg # word*semicolon* --> command* # At the same time, we manipulate (transform) the attributes on the tape, as # required. parse: #------------------------------------- # 2 tokens #------------------------------------- pop pop # All of the patterns below are currently errors, but may not # be in the future if we expand the syntax of the parse # language. Also consider: # begintext* endtext* quoteset* notclass*, !* ,* ;* B* E* # It is nice to trap the errors here because we can emit some # (hopefully not very cryptic) error messages with a line number. # Otherwise the script writer has to debug with # pep -a asm.pp -I scriptfile testis "word*word*" jumptrue 50 testis "word*}*" jumptrue 48 testis "word*begintext*" jumptrue 46 testis "word*endtext*" jumptrue 44 testis "word*!*" jumptrue 42 testis "word*,*" jumptrue 40 testis "quote*word*" jumptrue 38 testis "quote*class*" jumptrue 36 testis "quote*state*" jumptrue 34 testis "quote*}*" jumptrue 32 testis "quote*begintext*" jumptrue 30 testis "quote*endtext*" jumptrue 28 testis "class*word*" jumptrue 26 testis "class*quote*" jumptrue 24 testis "class*class*" jumptrue 22 testis "class*state*" jumptrue 20 testis "class*}*" jumptrue 18 testis "class*begintext*" jumptrue 16 testis "class*endtext*" jumptrue 14 testis "class*!*" jumptrue 12 testis "notclass*word*" jumptrue 10 testis "notclass*quote*" jumptrue 8 testis "notclass*class*" jumptrue 6 testis "notclass*state*" jumptrue 4 testis "notclass*}*" jumptrue 2 jump block.end.20388 add " (Token stack) \nValue: \n" get add "\nValue: \n" ++ get -- add "\n" add "Error near line " ll add " (char " cc add ")" add " of pep script (missing semicolon?) \n" print clear quit block.end.20388: testis "{*;*" jumptrue 6 testis ";*;*" jumptrue 4 testis "}*;*" jumptrue 2 jump block.end.20583 push push add "Error near line " ll add " (char " cc add ")" add " of pep script: misplaced semi-colon? ; \n" print clear quit block.end.20583: testis ",*{*" jumpfalse block.end.20753 push push add "Error near line " ll add " (char " cc add ")" add " of script: extra comma in list? \n" print clear quit block.end.20753: testis "command*;*" jumptrue 4 testis "commandset*;*" jumptrue 2 jump block.end.20942 push push add "Error near line " ll add " (char " cc add ")" add " of script: extra semi-colon? \n" print clear quit block.end.20942: testis "!*!*" jumpfalse block.end.21205 push push add "error near line " ll add " (char " cc add ")" add " of script: \n double negation '!!' is not implemented \n" add " and probably won't be, because what would be the point? \n" print clear quit block.end.21205: testis "!*{*" jumptrue 4 testis "!*;*" jumptrue 2 jump block.end.21520 push push add "error near line " ll add " (char " cc add ")" add " of script: misplaced negation operator (!)? \n" add " The negation operator precedes tests, for example: \n" add " !B'abc'{ ... } or !(eof),!'abc'{ ... } \n" print clear quit block.end.21520: testis ",*command*" jumpfalse block.end.21696 push push add "error near line " ll add " (char " cc add ")" add " of script: misplaced comma? \n" print clear quit block.end.21696: testis "!*command*" jumpfalse block.end.21901 push push add "error near line " ll add " (at char " cc add ") \n" add " The negation operator (!) cannot precede a command \n" print clear quit block.end.21901: testis ";*{*" jumptrue 6 testis "command*{*" jumptrue 4 testis "commandset*{*" jumptrue 2 jump block.end.22110 push push add "error near line " ll add " (char " cc add ")" add " of script: no test for brace block? \n" print clear quit block.end.22110: testis "{*}*" jumpfalse block.end.22244 push push add "error near line " ll add " of script: empty braces {}. \n" print clear quit block.end.22244: testis "B*class*" jumptrue 4 testis "E*class*" jumptrue 2 jump block.end.22475 push push add "error near line " ll add " of script:\n classes ([a-z], [:space:] etc). \n" add " cannot use the 'begin' or 'end' modifiers (B/E) \n" print clear quit block.end.22475: testis "comment*{*" jumpfalse block.end.22667 push push add "error near line " ll add " of script: comments cannot occur between \n" add " a test and a brace ({). \n" print clear quit block.end.22667: testis "}*command*" jumpfalse block.end.22817 push push add "error near line " ll add " of script: extra closing brace '}' ?. \n" print clear quit block.end.22817: # E"begin*".!"begin*" { # push; push; # add "error near line "; lines; # add " of script: Begin blocks must precede code \n"; # print; clear; quit; # } # #------------ # The .restart command jumps to the first instruction after the # begin block (if there is a begin block), or the first instruction # of the script. testis ".*word*" jumpfalse block.end.24080 clear ++ get -- testis "restart" jumpfalse block.end.23531 clear add "continue script;" # not required because we have labelled loops, # continue script works both before and after the parse> label # "0" { clear; add "continue script;"; } # "1" { clear; add "break lex;"; } put clear add "command*" push jump parse block.end.23531: testis "reparse" jumpfalse block.end.23867 clear count # check accumulator to see if we are in the "lex" block # or the "parse" block and adjust the .reparse compilation # accordingly. testis "0" jumpfalse block.end.23755 clear add "break lex;" block.end.23755: testis "1" jumpfalse block.end.23799 clear add "continue parse;" block.end.23799: put clear add "command*" push jump parse block.end.23867: push push add "error near line " ll add " (char " cc add ")" add " of script: \n" add " misplaced dot '.' (use for AND logic or in .reparse/.restart \n" print clear quit block.end.24080: #--------------------------------- # Compiling comments so as to transfer them to the java testis "comment*command*" jumptrue 6 testis "command*comment*" jumptrue 4 testis "commandset*comment*" jumptrue 2 jump block.end.24331 clear get add "\n" ++ get -- put clear add "command*" push jump parse block.end.24331: testis "comment*comment*" jumpfalse block.end.24445 clear get add "\n" ++ get -- put clear add "comment*" push jump parse block.end.24445: # ----------------------- # negated tokens. # This is a new more elegant way to negate a whole set of # tests (tokens) where the negation logic is stored on the # stack, not in the current tape cell. We just add "not" to # the stack token. # eg: ![:alpha:] ![a-z] ![abcd] !"abc" !B"abc" !E"xyz" # This format is used to indicate a negative test for # a brace block. eg: ![aeiou] { add "< not a vowel"; print; clear; } testis "!*quote*" jumptrue 12 testis "!*class*" jumptrue 10 testis "!*begintext*" jumptrue 8 testis "!*endtext*" jumptrue 6 testis "!*eof*" jumptrue 4 testis "!*tapetest*" jumptrue 2 jump block.end.25243 # a simplification: store the token name "quote*/class*/..." # in the tape cell corresponding to the "!*" token. replace "!*" "not" push # this was a bug?? a missing ++; ?? # now get the token-value get -- put ++ clear jump parse block.end.25243: #----------------------------------------- # format: E"text" or E'text' # This format is used to indicate a "workspace-ends-with" text before # a brace block. testis "E*quote*" jumpfalse block.end.25749 clear add "endtext*" push get testis "\"\"" jumpfalse block.end.25706 # empty argument is an error clear add "pep script error near line " ll add " (character " cc add "): \n" add " empty argument for end-test (E\"\") \n" print quit block.end.25706: -- put ++ clear jump parse block.end.25749: #----------------------------------------- # format: B"sometext" or B'sometext' # A 'B' preceding some quoted text is used to indicate a # 'workspace-begins-with' test, before a brace block. testis "B*quote*" jumpfalse block.end.26296 clear add "begintext*" push get testis "\"\"" jumpfalse block.end.26253 # empty argument is an error clear add "pep script error near line " ll add " (character " cc add "): \n" add " empty argument for begin-test (B\"\") \n" print quit block.end.26253: -- put ++ clear jump parse block.end.26296: #-------------------------------------------- # ebnf: command := word, ';' ; # formats: "pop; push; clear; print; " etc # all commands need to end with a semi-colon except for # .reparse and .restart testis "word*;*" jumpfalse block.end.30385 clear # check if command requires parameter get testis "add" jumptrue 18 testis "while" jumptrue 16 testis "whilenot" jumptrue 14 testis "mark" jumptrue 12 testis "go" jumptrue 10 testis "escape" jumptrue 8 testis "unescape" jumptrue 6 testis "delim" jumptrue 4 testis "replace" jumptrue 2 jump block.end.26844 put clear add "'" get add "'" add " command needs an argument, on line " ll add " of script.\n" print clear quit block.end.26844: # the new until; command with no argument testis "until" jumpfalse block.end.27014 clear add "mm.until(mm.tape.get(mm.tapePointer)); /* until (tape) */" put block.end.27014: testis "clip" jumpfalse block.end.27281 clear # are these length tests really necessary add "if (mm.workspace.length() > 0) { /* clip */\n" add " mm.workspace.delete(mm.workspace.length() - 1, \n" add " mm.workspace.length()); }" put block.end.27281: testis "clop" jumpfalse block.end.27449 clear add "if (mm.workspace.length() > 0) { /* clop */\n" add " mm.workspace.delete(0, 1); } /* clop */" put block.end.27449: testis "clear" jumpfalse block.end.27564 clear add "mm.workspace.setLength(0);" add " /* clear */" put block.end.27564: testis "upper" jumpfalse block.end.27836 clear add "/* upper */ \n" add "for (int i = 0; i < mm.workspace.length(); i++) { \n" add " char c = mm.workspace.charAt(i); \n" add " mm.workspace.setCharAt(i, Character.toUpperCase(c)); } " put block.end.27836: testis "lower" jumpfalse block.end.28107 clear add "/* lower */ \n" add "for (int i = 0; i < mm.workspace.length(); i++) { \n" add " char c = mm.workspace.charAt(i); \n" add " mm.workspace.setCharAt(i, Character.toLowerCase(c)); } " put block.end.28107: testis "cap" jumpfalse block.end.28484 clear add "/* cap */ \n" add "for (int i = 0; i < mm.workspace.length(); i++) { \n" add " char c = mm.workspace.charAt(i); \n" add " if (i==0){ mm.workspace.setCharAt(i, Character.toUpperCase(c)); } \n" add " else { mm.workspace.setCharAt(i, Character.toLowerCase(c)); } \n" add "}" put block.end.28484: testis "print" jumpfalse block.end.28578 clear add "System.out.print(mm.workspace); /* print */" put block.end.28578: testis "pop" jumpfalse block.end.28622 clear add "mm.pop();" put block.end.28622: testis "push" jumpfalse block.end.28668 clear add "mm.push();" put block.end.28668: testis "unstack" jumpfalse block.end.28756 clear add "while (mm.pop()); /* unstack */" put block.end.28756: testis "stack" jumpfalse block.end.28840 clear add "while(mm.push()); /* stack */" put block.end.28840: testis "put" jumpfalse block.end.29016 clear add "mm.tape.get(mm.tapePointer).setLength(0); /* put */\n" add "mm.tape.get(mm.tapePointer).append(mm.workspace); " put block.end.29016: testis "get" jumpfalse block.end.29138 clear add "mm.workspace.append(mm.tape.get(mm.tapePointer)); /* get */" put block.end.29138: testis "swap" jumpfalse block.end.29184 clear add "mm.swap();" put block.end.29184: testis "++" jumpfalse block.end.29282 clear add "mm.increment();" add " /* ++ */" put block.end.29282: testis "--" jumpfalse block.end.29387 clear add "if (mm.tapePointer > 0) mm.tapePointer--; /* -- */" put block.end.29387: testis "read" jumpfalse block.end.29444 clear add "mm.read(); /* read */" put block.end.29444: testis "count" jumpfalse block.end.29551 clear add "mm.workspace.append(mm.accumulator); /* count */" put block.end.29551: testis "a+" jumpfalse block.end.29611 clear add "mm.accumulator++; /* a+ */" put block.end.29611: testis "a-" jumpfalse block.end.29671 clear add "mm.accumulator--; /* a- */" put block.end.29671: testis "zero" jumpfalse block.end.29737 clear add "mm.accumulator = 0; /* zero */" put block.end.29737: testis "chars" jumpfalse block.end.29834 clear add "mm.workspace.append(mm.charsRead); /* chars */" put block.end.29834: testis "lines" jumpfalse block.end.29931 clear add "mm.workspace.append(mm.linesRead); /* lines */" put block.end.29931: testis "nochars" jumpfalse block.end.30001 clear add "mm.charsRead = 0; /* nochars */" put block.end.30001: testis "nolines" jumpfalse block.end.30071 clear add "mm.linesRead = 0; /* nolines */" put block.end.30071: # use a labelled loop to quit script. testis "quit" jumpfalse block.end.30163 clear add "break script;" put block.end.30163: testis "write" jumpfalse block.end.30217 clear add "mm.writeToFile();" put block.end.30217: # just eliminate since it does nothing. testis "nop" jumpfalse block.end.30331 clear add "/* nop: no-operation eliminated */" put block.end.30331: clear add "command*" push jump parse block.end.30385: #----------------------------------------- # ebnf: commandset := command , command ; testis "command*command*" jumptrue 4 testis "commandset*command*" jumptrue 2 jump block.end.30709 clear add "commandset*" push # format the tape attributes. Add the next command on a newline -- get add "\n" ++ get -- put ++ clear jump parse block.end.30709: #------------------- # here we begin to parse "test*" and "ortestset*" and "andtestset*" # #------------------- # eg: B"abc" {} or E"xyz" {} # transform and markup the different test types testis "begintext*,*" jumptrue 36 testis "endtext*,*" jumptrue 34 testis "quote*,*" jumptrue 32 testis "class*,*" jumptrue 30 testis "eof*,*" jumptrue 28 testis "tapetest*,*" jumptrue 26 testis "begintext*.*" jumptrue 24 testis "endtext*.*" jumptrue 22 testis "quote*.*" jumptrue 20 testis "class*.*" jumptrue 18 testis "eof*.*" jumptrue 16 testis "tapetest*.*" jumptrue 14 testis "begintext*{*" jumptrue 12 testis "endtext*{*" jumptrue 10 testis "quote*{*" jumptrue 8 testis "class*{*" jumptrue 6 testis "eof*{*" jumptrue 4 testis "tapetest*{*" jumptrue 2 jump block.end.31965 testbegins "begin" jumpfalse block.end.31220 clear add "mm.workspace.toString().startsWith(" block.end.31220: testbegins "end" jumpfalse block.end.31283 clear add "mm.workspace.toString().endsWith(" block.end.31283: testbegins "quote" jumpfalse block.end.31346 clear add "mm.workspace.toString().equals(" block.end.31346: testbegins "class" jumpfalse block.end.31410 clear add "mm.workspace.toString().matches(" block.end.31410: # clear the tapecell for testeof and testtape because # they take no arguments. testbegins "eof" jumpfalse block.end.31540 clear put add "mm.eof" block.end.31540: testbegins "tapetest" jumpfalse block.end.31679 clear put add "(mm.workspace.toString().equals(mm.tape.get(mm.tapePointer).toString())" block.end.31679: get testbegins "mm.eof" jumptrue block.end.31712 add ")" block.end.31712: put # # maybe we could ellide the not tests by doing here # B"not" { clear; add "!"; get; put; } # clear add "test*" push # the trick below pushes the right token back on the stack. get add "*" push jump parse block.end.31965: #------------------- # negated tests # eg: !B"xyz {} !(eof) {} !(==) {} # !E"xyz" {} # !"abc" {} # ![a-z] {} testis "notbegintext*,*" jumptrue 36 testis "notendtext*,*" jumptrue 34 testis "notquote*,*" jumptrue 32 testis "notclass*,*" jumptrue 30 testis "noteof*,*" jumptrue 28 testis "nottapetest*,*" jumptrue 26 testis "notbegintext*.*" jumptrue 24 testis "notendtext*.*" jumptrue 22 testis "notquote*.*" jumptrue 20 testis "notclass*.*" jumptrue 18 testis "noteof*.*" jumptrue 16 testis "nottapetest*.*" jumptrue 14 testis "notbegintext*{*" jumptrue 12 testis "notendtext*{*" jumptrue 10 testis "notquote*{*" jumptrue 8 testis "notclass*{*" jumptrue 6 testis "noteof*{*" jumptrue 4 testis "nottapetest*{*" jumptrue 2 jump block.end.33116 testbegins "notbegin" jumpfalse block.end.32466 clear add "!mm.workspace.toString().startsWith(" block.end.32466: testbegins "notend" jumpfalse block.end.32533 clear add "!mm.workspace.toString().endsWith(" block.end.32533: testbegins "notquote" jumpfalse block.end.32600 clear add "!mm.workspace.toString().equals(" block.end.32600: testbegins "notclass" jumpfalse block.end.32668 clear add "!mm.workspace.toString().matches(" block.end.32668: # clear the tapecell for testeof and testtape because # they take no arguments. testbegins "noteof" jumpfalse block.end.32802 clear put add "!mm.eof" block.end.32802: testbegins "nottapetest" jumpfalse block.end.32945 clear put add "(!mm.workspace.toString().equals(mm.tape.get(mm.tapePointer).toString())" block.end.32945: get testbegins "!mm.eof" jumptrue block.end.32979 add ")" block.end.32979: put clear add "test*" push # the trick below pushes the right token back on the stack. get add "*" push jump parse block.end.33116: #------------------- # 3 tokens #------------------- pop #----------------------------- # some 3 token errors!!! # not a comprehensive list of 3 token errors testis "{*quote*;*" jumptrue 12 testis "{*begintext*;*" jumptrue 10 testis "{*endtext*;*" jumptrue 8 testis "{*class*;*" jumptrue 6 testis "commandset*quote*;*" jumptrue 4 testis "command*quote*;*" jumptrue 2 jump block.end.33593 push push push add "[pep error]\n invalid syntax near line " ll add " (char " cc add ")" add " of script (misplaced semicolon?) \n" print clear quit block.end.33593: # to simplify subsequent tests, transmogrify a single command # to a commandset (multiple commands). testis "{*command*}*" jumpfalse block.end.33789 clear add "{*commandset*}*" push push push jump parse block.end.33789: # errors! mixing AND and OR concatenation testis ",*andtestset*{*" jumptrue 4 testis ".*ortestset*{*" jumptrue 2 jump block.end.34256 # push the tokens back to make debugging easier push push push add " error: mixing AND (.) and OR (,) concatenation in \n" add " in pep script near line " ll add " (character " cc add ") \n" add " \n" add " For example:\n" add " B\".\".!E\"/\".[abcd./] { print; } # Correct!\n" add " B\".\".!E\"/\",[abcd./] { print; } # Error! \n" print clear quit block.end.34256: #-------------------------------------------- # ebnf: command := keyword , quoted-text , ";" ; # format: add "text"; testis "word*quote*;*" jumpfalse block.end.38138 clear get testis "replace" jumpfalse block.end.34599 # error add "< command requires 2 parameters, not 1 \n" add "near line " ll add " of script. \n" print clear quit block.end.34599: # check whether argument is single character, otherwise # throw an error testis "escape" jumptrue 8 testis "unescape" jumptrue 6 testis "while" jumptrue 4 testis "whilenot" jumptrue 2 jump block.end.35597 # This is trickier than I thought it would be. clear ++ get -- # check that arg not empty, (but an empty quote is ok # for the second arg of 'replace' testis "\"\"" jumpfalse block.end.35149 clear add "[pep error] near line " ll add " (or char " cc add "): \n" add " command '" get add "\' cannot have an empty argument (\"\") \n" print quit block.end.35149: # quoted text has the quotes still around it. # also handle escape characters like \n \r etc clip clop clop clop # B "\\" { clip; } clip testis "" jumptrue block.end.35573 clear add "Pep script error near line " ll add " (character " cc add "): \n" add " command '" get add "' takes only a single character argument. \n" print quit block.end.35573: clear get block.end.35597: testis "mark" jumpfalse block.end.35862 clear add "/* mark */ \n" add "mm.marks.get(mm.tapePointer).setLength(0); // mark \n" add "mm.marks.get(mm.tapePointer).append(" ++ get -- add "); // mark" put clear add "command*" push jump parse block.end.35862: testis "go" jumpfalse block.end.36002 clear add "mm.goToMark(" ++ get -- add "); /* go */" put clear add "command*" push jump parse block.end.36002: testis "delim" jumpfalse block.end.36327 clear # this.delimiter.setCharAt(0, text.charAt(0)); # only the first character of the delimiter argument is used. add "mm.delimiter.setLength(0); /* delim */\n" add "mm.delimiter.append(" ++ get -- add "); " put clear add "command*" push jump parse block.end.36327: testis "add" jumpfalse block.end.36565 clear add "mm.workspace.append(" ++ get -- # handle multiline text replace "\n" "\"); \nmm.workspace.append(\"\\n" add "); /* add */" put clear add "command*" push jump parse block.end.36565: testis "while" jumpfalse block.end.36796 clear add "while ((char) mm.peep == " ++ get -- add ".charAt(0)) /* while */\n " add " { if (mm.eof) {break;} mm.read(); }" put clear add "command*" push jump parse block.end.36796: testis "whilenot" jumpfalse block.end.37028 clear add "while ((char) mm.peep != " ++ get -- add ".charAt(0)) /* whilenot */\n " add " { if (mm.eof) {break;} mm.read(); }" put clear add "command*" push jump parse block.end.37028: testis "until" jumpfalse block.end.37629 clear add "mm.until(" ++ get -- # error until cannot have empty argument testis "mm.until(\"\"" jumpfalse block.end.37493 clear add "Pep script error near line " ll add " (character " cc add "): \n" add " empty argument for 'until' \n" add " \n" add " For example:\n" add " until '.txt'; until \">\"; # correct \n" add " until ''; until \"\"; # errors! \n" print quit block.end.37493: # handle multiline argument replace "\n" "\\n" add ");" put clear add "command*" push jump parse block.end.37629: # But really, can't the "replace" command just be used # instead of escape/unescape?? This seems a flaw in the # machine design. testis "escape" jumptrue 4 testis "unescape" jumptrue 2 jump block.end.37947 clear add "mm." get add "Char" add "(" ++ get -- add ".charAt(0));" put clear add "command*" push jump parse block.end.37947: # error, superfluous argument add ": command does not take an argument \n" add "near line " ll add " of script. \n" print clear #state quit block.end.38138: #---------------------------------- # format: "while [:alpha:] ;" or whilenot [a-z] ; testis "word*class*;*" jumpfalse block.end.38928 clear get testis "while" jumpfalse block.end.38519 clear add "/* while */ \n" add "while (Character.toString((char)mm.peep).matches(" ++ get -- add ")) { if (mm.eof) { break; } mm.read(); }" put clear add "command*" push jump parse block.end.38519: testis "whilenot" jumpfalse block.end.38775 clear add "/* whilenot */ \n" add "while (!Character.toString((char)mm.peep).matches(" ++ get -- add ")) { if (mm.eof) { break; } mm.read(); }" put clear add "command*" push jump parse block.end.38775: # error add " < command cannot have a class argument \n" add "line " ll add ": error in script \n" print clear quit block.end.38928: # arrange the parse> label loops testeof jumpfalse block.end.39646 testis "commandset*parse>*commandset*" jumptrue 8 testis "command*parse>*commandset*" jumptrue 6 testis "commandset*parse>*command*" jumptrue 4 testis "command*parse>*command*" jumptrue 2 jump block.end.39642 clear # indent both code blocks add " " get replace "\n" "\n " put clear ++ ++ add " " get replace "\n" "\n " put clear -- -- # add a block so that .reparse works before the parse> label. add "lex: { \n" get add "\n}\n" ++ ++ # indent code block # add " "; get; replace "\n" "\n "; put; clear; add "parse: \n" add "while (true) { \n" get add "\n break parse;\n}" -- -- put clear add "commandset*" push jump parse block.end.39642: block.end.39646: # ------------------------------- # 4 tokens # ------------------------------- pop #------------------------------------- # bnf: command := replace , quote , quote , ";" ; # example: replace "and" "AND" ; testis "word*quote*quote*;*" jumpfalse block.end.40560 clear get testis "replace" jumpfalse block.end.40391 #--------------------------- # a command plus 2 arguments, eg replace "this" "that" clear add "/* replace */ \n" add "if (mm.workspace.length() > 0) { \n" add " temp = mm.workspace.toString().replace(" ++ get add ", " ++ get add ");\n" add " mm.workspace.setLength(0); \n" add " mm.workspace.append(temp);\n} " -- -- put clear add "command*" push jump parse block.end.40391: add "pep script error on line " ll add " (character " cc add "): \n" add " command does not take 2 quoted arguments. \n" print quit block.end.40560: #------------------------------------- # format: begin { #* commands *# } # "begin" blocks which are only executed once (they # will are assembled before the "start:" label. They must come before # all other commands. # "begin*{*command*}*", testis "begin*{*commandset*}*" jumpfalse block.end.40944 clear ++ ++ get -- -- put clear add "beginblock*" push jump parse block.end.40944: # ------------- # parses and compiles concatenated tests # eg: 'a',B'b',E'c',[def],[:space:],[g-k] { ... # these 2 tests should be all that is necessary testis "test*,*ortestset*{*" jumptrue 4 testis "test*,*test*{*" jumptrue 2 jump block.end.41288 clear get add " || " ++ ++ get -- -- put clear add "ortestset*{*" push push jump parse block.end.41288: # dont mix AND and OR concatenations # ------------- # AND logic # parses and compiles concatenated AND tests # eg: 'a',B'b',E'c',[def],[:space:],[g-k] { ... # it is possible to elide this block with the negated block # for compactness but maybe readability is not as good. # negated tests can be chained with non negated tests. # eg: B'http' . !E'.txt' { ... } testis "test*.*andtestset*{*" jumptrue 4 testis "test*.*test*{*" jumptrue 2 jump block.end.41857 clear get add " && " ++ ++ get -- -- put clear add "andtestset*{*" push push jump parse block.end.41857: #------------------------------------- # we should not have to check for the {*command*}* pattern # because that has already been transformed to {*commandset*}* testis "test*{*commandset*}*" jumptrue 6 testis "andtestset*{*commandset*}*" jumptrue 4 testis "ortestset*{*commandset*}*" jumptrue 2 jump block.end.42420 clear # indent the java code for readability ++ ++ add " " get replace "\n" "\n " put -- -- clear add "if (" get add ") {\n" ++ ++ get add "\n}" -- -- put clear add "command*" push # always reparse/compile jump parse block.end.42420: # ------------- # multi-token end-of-stream errors # not a comprehensive list of errors... testeof jumpfalse block.end.43215 testends "begintext*" jumptrue 10 testends "endtext*" jumptrue 8 testends "test*" jumptrue 6 testends "ortestset*" jumptrue 4 testends "andtestset*" jumptrue 2 jump block.end.42730 add " Error near end of script at line " ll add ". Test with no brace block? \n" print clear quit block.end.42730: testends "quote*" jumptrue 6 testends "class*" jumptrue 4 testends "word*" jumptrue 2 jump block.end.42955 put clear add "Error at end of pep script near line " ll add ": missing semi-colon? \n" add "Parse stack: " get add "\n" print clear quit block.end.42955: testends "{*" jumptrue 16 testends "}*" jumptrue 14 testends ";*" jumptrue 12 testends ",*" jumptrue 10 testends ".*" jumptrue 8 testends "!*" jumptrue 6 testends "B*" jumptrue 4 testends "E*" jumptrue 2 jump block.end.43211 put clear add "Error: misplaced terminal character at end of script! (line " ll add "). \n" add "Parse stack: " get add "\n" print clear quit block.end.43211: block.end.43215: # put the 4 (or less) tokens back on the stack push push push push testeof jumpfalse block.end.53194 print clear # create the virtual machine object code and save it # somewhere on the tape. add "\n" add "\n" add " /* Java code generated by \"translate.java.pss\" */\n" add " import java.io.*;\n" add " import java.util.regex.*;\n" add " import java.util.*; // contains stack\n" add "\n" add " public class Machine {\n" add " // using int instead of char so that all unicode code points are\n" add " // available instead of just utf16. (emojis cant fit into utf16)\n" add " private int accumulator; // counter for anything\n" add " private int peep; // next char in input stream\n" add " private int charsRead; // No. of chars read so far\n" add " private int linesRead; // No. of lines read so far\n" add " public StringBuffer workspace; // text accumulator\n" add " private Stack stack; // parse token stack\n" add " private int LENGTH; // tape initial length\n" add "\n" add " // use ArrayLists instead with .add() .get(n) and .set(n, E)\n" add " // ArrayList al=new ArrayList();\n" add " private List tape; // array of token attributes \n" add " private List marks; // tape marks\n" add " private int tapePointer; // pointer to current cell\n" add " private Reader input; // text input stream\n" add " private boolean eof; // end of stream reached?\n" add " private boolean flag; // not used here\n" add " private StringBuffer escape; // char used to \"escape\" others \"\\\"\n" add " private StringBuffer delimiter; // push/pop delimiter (default is \"*\")\n" add " private boolean markFound; // if the mark was found in tape\n" add " \n" add " /** make a new machine with a character stream reader */\n" add " public Machine(Reader reader) {\n" add " this.markFound = false; \n" add " this.LENGTH = 100;\n" add " this.input = reader;\n" add " this.eof = false;\n" add " this.flag = false;\n" add " this.charsRead = 0; \n" add " this.linesRead = 1; \n" add " this.escape = new StringBuffer(\"\\\\\");\n" add " this.delimiter = new StringBuffer(\"*\");\n" add " this.accumulator = 0;\n" add " this.workspace = new StringBuffer(\"\");\n" add " this.stack = new Stack();\n" add " this.tapePointer = 0;\n" add " this.tape = new ArrayList();\n" add " this.marks = new ArrayList();\n" add " for (int ii = 0; ii < this.LENGTH; ii++) {\n" add " this.tape.add(new StringBuffer(\"\"));\n" add " this.marks.add(new StringBuffer(\"\"));\n" add " }\n" add "\n" add " try\n" add " { this.peep = this.input.read(); } \n" add " catch (java.io.IOException ex) {\n" add " System.out.println(\"read error\");\n" add " System.exit(-1);\n" add " }\n" add " }\n" add "\n" add " /** read one character from the input stream and \n" add " update the machine. */\n" add " public void read() {\n" add " int iChar;\n" add " try {\n" add " if (this.eof) { System.exit(0); }\n" add " this.charsRead++;\n" add " // increment lines\n" add " if ((char)this.peep == \'\\n\') { this.linesRead++; }\n" add " this.workspace.append(Character.toChars(this.peep));\n" add " this.peep = this.input.read(); \n" add " if (this.peep == -1) { this.eof = true; }\n" add " }\n" add " catch (IOException ex) {\n" add " System.out.println(\"Error reading input stream\" + ex);\n" add " System.exit(-1);\n" add " }\n" add " }\n" add "\n" add " /** increment tape pointer by one */\n" add " public void increment() {\n" add " this.tapePointer++;\n" add " if (this.tapePointer >= this.LENGTH) {\n" add " this.tape.add(new StringBuffer(\"\"));\n" add " this.marks.add(new StringBuffer(\"\"));\n" add " this.LENGTH++;\n" add " }\n" add " }\n" add " \n" add " /** remove escape character */\n" add " public void unescapeChar(char c) {\n" add " if (workspace.length() > 0) {\n" add " String s = this.workspace.toString().replace(\"\\\\\"+c, c+\"\");\n" add " this.workspace.setLength(0); workspace.append(s);\n" add " }\n" add " }\n" add "\n" add " /** add escape character */\n" add " public void escapeChar(char c) {\n" add " if (workspace.length() > 0) {\n" add " String s = this.workspace.toString().replace(c+\"\", \"\\\\\"+c);\n" add " workspace.setLength(0); workspace.append(s);\n" add " }\n" add " }\n" add "\n" add " /** whether trailing escapes \\\\ are even or odd */\n" add " // untested code. check! eg try: add \"x \\\\\"; print; etc\n" add " public boolean isEscaped(String ss, String sSuffix) {\n" add " int count = 0; \n" add " if (ss.length() < 2) return false;\n" add " if (ss.length() <= sSuffix.length()) return false;\n" add " if (ss.indexOf(this.escape.toString().charAt(0)) == -1) \n" add " { return false; }\n" add "\n" add " int pos = ss.length()-sSuffix.length();\n" add " while ((pos > -1) && (ss.charAt(pos) == this.escape.toString().charAt(0))) {\n" add " count++; pos--;\n" add " }\n" add " if (count % 2 == 0) return false;\n" add " return true;\n" add " }\n" add "\n" add " /* a helper to see how many trailing \\\\ escape chars */\n" add " private int countEscaped(String sSuffix) {\n" add " String s = \"\";\n" add " int count = 0;\n" add " int index = this.workspace.toString().lastIndexOf(sSuffix);\n" add " // remove suffix if it exists\n" add " if (index > 0) {\n" add " s = this.workspace.toString().substring(0, index);\n" add " }\n" add " while (s.endsWith(this.escape.toString())) {\n" add " count++;\n" add " s = s.substring(0, s.lastIndexOf(this.escape.toString()));\n" add " }\n" add " return count;\n" add " }\n" add "\n" add " /** reads the input stream until the workspace end with text */\n" add " // can test this with\n" add " public void until(String sSuffix) {\n" add " // read at least one character\n" add " if (this.eof) return; \n" add " this.read();\n" add " while (true) {\n" add " if (this.eof) return;\n" add " if (this.workspace.toString().endsWith(sSuffix)) {\n" add " if (this.countEscaped(sSuffix) % 2 == 0) { return; }\n" add " }\n" add " this.read();\n" add " }\n" add " }\n" add "\n" add " /** pop the first token from the stack into the workspace */\n" add " public Boolean pop() {\n" add " if (this.stack.isEmpty()) return false;\n" add " this.workspace.insert(0, this.stack.pop()); \n" add " if (this.tapePointer > 0) this.tapePointer--;\n" add " return true;\n" add " }\n" add "\n" add " /** push the first token from the workspace to the stack */\n" add " public Boolean push() {\n" add " String sItem;\n" add " // dont increment the tape pointer on an empty push\n" add " if (this.workspace.length() == 0) return false;\n" add " // need to get this from this.delim not \"*\"\n" add " int iFirstStar = \n" add " this.workspace.indexOf(this.delimiter.toString());\n" add " if (iFirstStar != -1) {\n" add " sItem = this.workspace.toString().substring(0, iFirstStar + 1);\n" add " this.workspace.delete(0, iFirstStar + 1);\n" add " }\n" add " else {\n" add " sItem = this.workspace.toString();\n" add " this.workspace.setLength(0);\n" add " }\n" add " this.stack.push(sItem); \n" add " this.increment(); \n" add " return true;\n" add " }\n" add "\n" add " /** swap current tape cell with the workspace */\n" add " public void swap() {\n" add " String s = new String(this.workspace);\n" add " this.workspace.setLength(0);\n" add " this.workspace.append(this.tape.get(this.tapePointer).toString());\n" add " this.tape.get(this.tapePointer).setLength(0);\n" add " this.tape.get(this.tapePointer).append(s);\n" add " }\n" add "\n" add " /** save the workspace to file \"sav.pp\" */\n" add " public void writeToFile() {\n" add " try {\n" add " File file = new File(\"sav.pp\");\n" add " Writer out = new BufferedWriter(new OutputStreamWriter(\n" add " new FileOutputStream(file), \"UTF8\"));\n" add " out.append(this.workspace.toString());\n" add " out.flush(); out.close();\n" add " } catch (Exception e) { \n" add " System.out.println(e.getMessage());\n" add " }\n" add " }\n" add "\n" add " public void goToMark(String mark) {\n" add " this.markFound = false; \n" add " for (var ii = 0; ii < this.marks.size(); ii++) {\n" add " if (this.marks.get(ii).toString().equals(mark)) { \n" add " this.tapePointer = ii; this.markFound = true; \n" add " }\n" add " }\n" add " if (this.markFound == false) { \n" add " System.out.print(\"badmark \'\" + mark + \"\'!\"); \n" add " System.exit(1);\n" add " }\n" add " }\n" add "\n" add " /** parse/check/compile the input */\n" add " public void parse(InputStreamReader input) {\n" add " //this is where the actual parsing/compiling code should go \n" add " //but this means that all generated code must use\n" add " //\"this.\" not \"mm.\"\n" add " }\n" add "\n" add " public static void main(String[] args) throws Exception { \n" add " String temp = \"\"; \n" add " Machine mm = new Machine(new InputStreamReader(System.in)); \n" # save the code in the current tape cell put clear #--------------------- # check if the script correctly parsed (there should only # be one token on the stack, namely "commandset*" or "command*"). pop pop testis "commandset*" jumptrue 4 testis "command*" jumptrue 2 jump block.end.51803 clear # indent generated code (6 spaces) for readability. add " " get replace "\n" "\n " put clear # restore the java preamble from the tape ++ get -- add "\n" add " script: \n" add " while (!mm.eof) {\n" get add "\n }" add "\n }" add "\n}\n" # put a copy of the final compilation into the tapecell # so it can be inspected interactively. put print clear quit block.end.51803: testis "beginblock*commandset*" jumptrue 4 testis "beginblock*command*" jumptrue 2 jump block.end.52505 clear # indent begin block code add " " get replace "\n" "\n " put clear # indent main code for readability. ++ add " " get replace "\n" "\n " put clear -- # get java preamble from tape ++ ++ get -- -- get add "\n" ++ # a labelled loop for "quit" (but quit can just exit?) add " script: \n" add " while (!mm.eof) {\n" get add "\n }" add "\n }" add "\n}\n" # put a copy of the final compilation into the tapecell # for interactive debugging. put print clear quit block.end.52505: push push # try to explain some more errors unstack testbegins "parse>" jumpfalse block.end.52771 put clear add "[error] pep syntax error:\n" add " The parse> label cannot be the 1st item \n" add " of a script \n" print quit block.end.52771: put clear clear add "After compiling with 'compile.java.pss' (at EOF): \n " add " parse error in input script. \n " print clear unstack put clear add "Parse stack: " get add "\n" add " * debug script " add " >> pep -If script -i 'some input' \n " add " * debug compilation. \n " add " >> pep -Ia asm.pp script' \n " print clear quit block.end.53194: # not eof # there is an implicit .restart command here (jump start) jump start