#* See eg/plzero.pss which is a more complete version of this script. This script is an attempt to create a pl/0 parser/compiler in the parse script language. I will develope this in stages, first adding conditions, then assigments, then begin-end blocks etc. Eventually I should have a complete "statement" rule with a grammar very similar to Wirth's. This has been an interesting learning exercise but the grammar rules have become very verbose. Also, I have to use look-ahead everywhere to parse expressions. I think I need to look for a more elegant way to do this. However, the optionality of the statement rule [...] will be encoded in the block rule. ----- # plus more combinations .... block = constdec vardec proclist statement block = constdec vardec proclist block = constdec vardec block = constdec block = vardec proclist statement block = vardec proclist block = vardec block = proclist statement block = proclist block = statement ,,, * the pl/0 grammar in wsn/ebnf form ------- program = block "." . block = [ "const" ident "=" number {"," ident "=" number} ";"] [ "var" ident {"," ident} ";"] { "procedure" ident ";" block ";" } statement . statement = [ ident ":=" expression | "call" ident | "?" ident | "!" expression | "begin" statement {";" statement } "end" | "if" condition "then" statement | "while" condition "do" statement ]. condition = "odd" expression | expression ("="|"#"|"<"|"<="|">"|">=") expression . expression = [ "+"|"-"] term { ("+"|"-") term}. term = factor {("*"|"/") factor}. factor = ident | number | "(" expression ")". ,,, TESTING * check if variable declarations are parsing >> pp -f eg/plzero.statement.pss -i "var a, b, c ;" * constant declarations >> pp -f eg/plzero.statement.pss -i "const a=1, b=2, c=3 ;" GOALS To add "conditions" and "assignments" and then some parts of the pl/0 statement grammar (if-then blocks, while-do blocks, begin-end blocks). Once parsing and recognising is working, to convert this script to a "compiler" so that it compiles pl/0 to ruby or python or java. CHALLENGES The "block" rule contains a series of optional elements that must appear in order eg: block := [constant.declarations] [var.declarations] [procedure.declarations] [statement] . all of these are optional. I we factor out the optionality we end up with a large number of separate rules which is tedious to code. But we can encode some state in a parse token. statements have no end marker, so how do we know when the expression is finished in "statement = ident ':=' expression ." ? So, the problem is, if we have an input stream: >> a := b+c*d That will parse to "ident ':=' expression" where expression == "b+c" which will reduce to "statement", which ignores the "*d" suffix and produces a parse error! One solution is to look at Wirth's grammar and see that statements must be terminated with either ";" or "end" or "." (in the case of a program block). "a = 4" is a legal 'condition' in pl/0 but is also part of a constant declaration. So I need to stop this parsing as a condition if it is preceded by "const" When parsing integer arithmetic expressions, we need to construct the grammar so that it contains a "lookahead" token, in order to deal with the operator precedence dilemma. LIMITATIONS The grammar was written for a recursive descent parser and the syntax of the language was also designed to be easily parsable for that type of compiler. It is becoming evident that it is somewhat painful attempting to write a LR shift-reduce parser/compiler using the machine for this syntax... The expression grammar does not currently (aug 2019) include +/- expression. This is because I find this syntax confusing eg: >> a++b*-c+-d HISTORY 30 august 2019 adapting from exp.recognise.pss and plzero.dec.pss This is actually nearing completion and appears to work quite well despite being a messy way to parse this syntax. This is because we constantly have to use look-ahead tokens to decide when a particular element has terminated, especially "expressions". The main thing missing now are procedure definitions. Statements are nearly complete *# read; [:space:] { clear; } "+","-" { put; clear; add "opadd*"; push; } "*","/" { put; clear; add "opmul*"; push; } # literal tokens # '!' is the print command in pl/0 # '?' is the read command in pl/0 # '#' means != # '.' marks the end of a program. ".",",","=",";","(",")","?","!" { put; add "*"; push; } # also need to parse <= >= comparison operators # and how to handle '=' which is comparison and also # constant assignment. "<",">" { while [=]; put; clear; add "compare*"; push; } "#" { put; clear; add "compare*"; push; } ":" { read; ":=" { put; add "*"; push; .reparse } add " << characters not allowed here in pl/0 syntax. \n"; add " at line number "; ll; add ".\n"; print; clear; quit; } [0-9] { while [0-9]; put; clear; add "number*"; push; .reparse } [:alpha:] { while [:alpha:]; # keywords in pl/0 "const","var","if","then","while","do","begin","end", "procedure","call","odd" { # or add ".key*" to remind us that these are keywords, # not parse tokens put; add "*"; push; .reparse } put; clear; add "ident*"; push; .reparse } !"" { # error unrecognised character add " << character not allowed here in pl/0 syntax. \n"; add " at line number "; ll; add ".\n"; print; clear; quit; } parse> # The parse/compile/translate/transform phase involves # recognising series of tokens on the stack and "reducing" them # according to the required bnf grammar rules. #----------------- # 1 token pop; #----------------- # 2 tokens pop; "block*.*" { clear; add "program*"; push; .reparse } #* "constdec*statement*","vardec*statement*" { clear; add "block*"; push; .reparse } *# "call*ident*" { clear; add "statement*"; push; .reparse } "?*ident*" { clear; add "statement*"; push; .reparse } # trigger a changer of identity for number or ident with # a lookahead token. "number*)*","ident*)*" { clear; add "exp*)*"; push; push; .reparse } "number*=*" { clear; add "exp*=*"; push; push; .reparse } "number*then*" { clear; add "exp*then*"; push; push; .reparse } "number*do*" { clear; add "exp*do*"; push; push; .reparse } "number*opmul*","ident*opmul*" { clear; add "exp*opmul*"; push; push; .reparse } "number*opadd*","ident*opadd*" { clear; add "exp*opadd*"; push; push; .reparse } "number*compare*","ident*compare*" { clear; add "exp*compare*"; push; push; .reparse } "compare*number*","compare*ident*" { clear; add "compare*exp*"; push; push; .reparse } # variable declarations "var*ident*" { clear; add "varset*"; push; .reparse } "varset*;*" { clear; add "vardec*"; push; .reparse } "conset*;*" { clear; add "constdec*"; push; .reparse } #----------------- # 3 tokens pop; "constdec*vardec*statement*" { clear; add "block*"; push; .reparse } "vardec*statement*.*" { clear; add "program*"; push; .reparse } "constdec*vardec*.*" { clear; add "program*"; push; .reparse } "constdec*statement*.*" { clear; add "block*.*"; push; push; .reparse } "exp*=*number*","exp*=*ident*","ident*=*ident*" { clear; add "exp*=*exp*"; push; push; push; .reparse } "!*exp*;*" { clear; add "statement*;*"; push; push; .reparse } "begin*statement*end*","begin*statementset*end*" { clear; add "statement*"; push; .reparse } "statement*;*statement*","statementset*;*statement*" { clear; add "statementset*"; push; .reparse } "odd*exp*then*" { clear; add "condition*then*"; push; push; .reparse } "odd*exp*do*" { clear; add "condition*do*"; push; push; .reparse } "varset*,*ident*" { clear; add "varset*"; push; .reparse } # trigger an identity change for numbers and variables. "exp*opadd*number*","exp*opadd*ident*", "exp*opmul*number*","exp*opmul*ident*" { push; push; clear; add "exp*"; push; .reparse } # we dont need any look ahead here because * and / have # precedence. "exp*opmul*exp*" { clear; add "exp*"; push; .reparse } "(*exp*)*" { clear; add "exp*"; push; .reparse } (eof) { "exp*opadd*exp*" { clear; add "exp*"; push; .reparse } } #----------------- # 4 tokens pop; "constdec*vardec*statement*.*" { clear; add "program*"; push; .reparse } "if*condition*then*statement*" { clear; add "statement*"; push; .reparse } "while*condition*do*statement*" { clear; add "statement*"; push; .reparse } # problem "statement*;*statement*;*","statementset*;*statement*;*" { clear; add "statementset*;*"; push; push; .reparse } "statement*;*statement*.*","statementset*;*statement*.*" { clear; add "statementset*.*"; push; push; .reparse } "statement*;*statement*end*","statementset*;*statement*end*" { clear; add "statementset*end*"; push; push; .reparse } # lookahead parsing of conditions "exp*compare*exp*then*" { clear; add "condition*then*"; push; push; .reparse } "ident*=*exp*then*" { clear; add "condition*then*"; push; push; .reparse } "exp*=*exp*then*" { clear; add "condition*then*"; push; push; .reparse } "exp*compare*exp*do*" { clear; add "condition*do*"; push; push; .reparse } "exp*=*exp*do*" { clear; add "condition*do*"; push; push; .reparse } "const*ident*=*number*" { clear; add "conset*"; push; .reparse } # use a lookahead to terminate assignments # otherwise we dont know where the expression ends "ident*:=*exp*;*", "ident*:=*ident*;*", "ident*:=*number*;*" { clear; add "statement*;*"; push; push; .reparse } "ident*:=*exp*end*", "ident*:=*ident*end*", "ident*:=*number*end*" { clear; add "statement*end*"; push; push; .reparse } "ident*:=*exp*.*", "ident*:=*ident*.*", "ident*:=*number*.*" { clear; add "statement*.*"; push; push; .reparse } # the following rules use a look-ahead parse token # to implement operator precedence (* and / have precedence over # + and - ) "exp*opadd*exp*opadd*" { clear; add "exp*opadd*"; push; push; .reparse } "exp*opadd*exp*)*" { clear; add "exp*)*"; push; push; .reparse } # this seems quite a painful way to parse, but # how do we know when it is time to reduce a+b, except by # looking ahead at the next token?? "exp*opadd*exp*.*" { clear; add "exp*.*"; push; push; .reparse } "exp*opadd*exp*;*" { clear; add "exp*;*"; push; push; .reparse } "exp*opadd*exp*end*" { clear; add "exp*end*"; push; push; .reparse } "exp*opadd*exp*then*" { clear; add "exp*then*"; push; push; .reparse } "exp*opadd*exp*do*" { clear; add "exp*do*"; push; push; .reparse } # try replace for a more succinct parse/compile. "exp*opadd*exp*compare*", "exp*opadd*exp*=*" { #clear; add "exp*compare*"; replace "exp*opadd*exp*" "exp*"; push; push; .reparse } #* "exp*opadd*exp*=*" { clear; add "exp*=*"; push; push; .reparse } *# # ---------------------- # 5 tokens pop; "if*condition*then*statement*;*" { clear; add "statement*;*"; push; push; .reparse } "if*condition*then*statement*.*" { clear; add "statement*.*"; push; push; .reparse } "while*condition*do*statement*;*" { clear; add "statement*;*"; push; push; .reparse } "while*condition*do*statement*.*" { clear; add "statement*.*"; push; push; .reparse } "conset*,*ident*=*number*" { clear; add "conset*"; push; .reparse } (eof) { add " << top 5 stack items.\n"; print; clear; quit; } push; push; push; push; push;