#* 

   "asm.handcode.pp" script compiler.

   This assembler script compiler was handcoded (not generated) but
   very soon I will try to use "compile.pss" as the script compiler
   because it uses a new, and better way to compile quoteset tokens.
   So I will run:

    >> cp asm.pp asm.old.pp; pp -f compile.pss compile.pss > asm.new.pp
    >> cp asm.new.pp asm.pp

   I will preserve this original "bootstrap" program as "asm.handcode.pp"
   just in case something goes wrong with the compile.pss script.

   This "assembler" script works with the pattern parsing machine
   at http://bumble.sf.net/books/gh/gh.c. It implements a script language
   with a syntax reminiscent of sed and awk (much simpler than awk, but
   more complex than sed).

   The proof is in the pudding!!
   As with many somewhat unnecessary software projects, this uses
   itself to parse and compile itself. This asm script demonstrates
   how the pp machine can parse and transform a context-free language
   (in this case a pattern-parsing script)

MOTIVATION

   By implementing a script language which can be run on the pattern-parsing
   virtual machine, we avoid having to write "assembler" programs like the
   present one. For example the script language allows us to use brace
   notation instead of having to hand-code tests and jumps to labels. 
   This is generally considered more readable.
 
ERROR TRAPPING

   It is prohibitive to try to trap all possible erroneous
   combinations of tokens (and give helpful error messages for them).
   But a better way is, use the accumulator to flag if a valid
   token combination has been found, and if not quit with a message.
   For example, with 2 stack tokens (this code is written in the parse
   script language, but the same idea applies in assembled instructions)

     pop; pop; 
     # set accumulator to 0
     zero;
     "tokenA*tokenB*",
     "tokenX*tokenY*",
     "tokenN*tokenM*" { a+; }
     count; 
     "0" { 
       clear; add "error in script on line "; ll;
       print; clear;
       quit; 
     }
     # now go on to actually parse/compile the tokens

   One disadvantage, is that when new syntax is added to asm.pp
   or compile.pss, then the error handler needs to be updated 
   as well, so as not to reject the new syntax.

BUGS
   
  comments and multiline comments should not jump back to read
  after deleting the comment, because there could be no more 
  input, and read will throw an error. They should jump to 
  "check.eof:". But check.eof should be at the end of the file.

ASSEMBLER

  This script is written in an assembly language. 

  As with all assembly languages, only one instruction per line.
  blank lines are ok. Labels must end with : 
  
SCRIPT EXAMPLES

  here are few examples of the format that this pp script can parse

  * append a dot to all lower case ascii letters 
  >>  [a-z] { add "."; print; clear; }

  * the same as above, using abbreviated commands
  >>  [a-z] { a ".";t;d; }

  * append a dot to the letters a, d or f 
  >>  [adf] { add "."; print; clear; }

  * append a dot to all alpha-numeric letters 
  >>  [:alpha:] { add "."; print; clear; }

  * print a message if work space is a letter and end reached (nested tests) 
  >>  [a-z] { (eof) { add "last letter"; print;  }}

  * if the end of file/stream has been reached, print "no more!"
  >> (eof) { clear; add "no more!"; print; clear;  }

TRANSFORMATIONS

   This script implements a set of transformations (which we could call
   a "compilation") from a sed-like (the unix stream editor) syntax
   to a kind of "assembly language". It is an assembly language in the 
   sense that each line of the assembly file corresponds to one instruction
   on the machine.

   This script needs to transform  "pop ;" to "\npop \n" for example
   * below is a more complex transformation, with a multiline
     bracketed statement. The script is transformed into an "assembler"
     format which can be loaded with the function loadAssembledProgram()
     in the gh.c code.

   :"abc" { add "gg"; print; clear; } 
   --> becomes
     testis "abc"
     jumpfalse 4
     add "gg"
     print
     clear


HISTORY

  14 august 2019

    added begin blocks to the script syntax. eg:
      begin { add "starting... \n"; print; clear; }
      read; print; clear;
    Also, added them to compile.pss
    There is no corresponding "end-block" because that functionality
    can be accessed through the (eof) test.

  30 july 2019

    Thought of a better way to do error handling. See the section
    in this document. Converting this assembler to script syntax
    in compile.pss. That is much more compact and readable and
    easier to maintain.

    adding "notclasses" to the script language syntax eg:
    ![a-z] { nop; nop; }

  28 july 2019

    Changed reparse and restart syntax to ".reparse" and
    ".restart" and parse label to "parse>". Also begintext syntax
    to B"text" and E"text". Added "quotesets*" which allow multiple
    tests for blocks, which is very convenient. eg "a","b" {nop;}
    
  26 july 2019

    Adding .reparse and <pp> parselabel. Also finished coding
    the testbegins/ends blocks.

  25 july 2019

    Added the replace command. We can create an alias replace
    which only has one parameter: eg "replace 'gg';" which 
    will delete gg everywhere in the workspace. The machine 
    instruction "replace" requires two parameters

  20 july 2019
    
    This compiler is reasonably complete, Maybe more error checking. Also,
    reparse and restart commands.

  16 july 2019

    fixed the bug the unconditional jump bug. Took a while to get round to it.
    just stopped subtracting linenumber in instructionFromText() jump
    calculations.

    revisiting. There is still a bug in the gh.c code about how
    absolute jump addresses are calculated. But it is not a difficult
    bug, and I think I could fix it in an hour or so. But I 
    havent done it.

  3 sept 2018
    Started to code this, since the pp machine (gh.c) is sufficiently
    complete to make it possible. Need to actually start doing 
    stack parsing, and think about what syntax to use for tests
    eg startwith ."abc" endswith ,"abc" equals :"abc"
    As you can see by this file, multiline comments are ok. 

 The code should assemble a script
 this assembler implements a type of bnf grammar
 eg: quote { commandset } --> command
     command command --> commandset
     commandset command --> commandset
     keyword arg colon --> command
 or: class { s1 } --> s1
     class { slist } --> s1
     qarg  { slist } --> s1


*# 

start:
  read

  testeof
  jumpfalse not.eof
    #add "end of script!! \n"
    print
    clear

    #---------------------
    # check if the script correctly parsed. There should either only
    # be one token on the stack, "commandset*" or "command*" or else
    # 2 tokens "beginblock*commandset*" or "beginblock*command*"
    pop
    pop
    testis "commandset*"
    jumptrue ok.onetoken
    testis "command*"
    jumptrue ok.onetoken
    testis "beginblock*commandset*"
    jumptrue ok.twotokens
    testis "beginblock*command*"
    jumptrue ok.twotokens
      push
      push
      # state
      clear
      add "While compiling with 'asm.pp' (reached EOF): \n "
      add "  parse error in script, check syntax or try \n "
      add "  'pp -Ia asm.pp script' to debug compilation \n "
      print
      clear
      # clear sav.pp because script could not be compiled
      write
      # bail means exit with error
      bail 

ok.twotokens:
    # preserve the last token for debugging
    clear
    add "# Compiled by 'asm.pp' \n"
    get
    add "\n"
    ++
    add "start:\n"
    get
    add "\njump start \n"
    # put a copy of final compilation into the tapecell
    # (which should be tapecell(0) ) so it can be inspected
    # interactively.
    put
    # save the compiled script to 'sav.pp'
    write
    clear 
    quit

ok.onetoken:
    # preserve the last token for debugging
    push
    --
    add "# compiled by 'asm.pp' \n"
    add "start:\n"
    get
    # an extra space because of a bug in compile()
    add "\njump start \n"
    # print just for debuging
    # print
    # put a copy of final compilation into the tapecell
    # (which should be tapecell(0) ) so it can be inspected
    # interactively.
    put
    # save the compiled script to 'sav.pp'
    write
    clear 
    quit

not.eof:

newline:
  #-----------------------------
  # was using the accumulator to make unique labels. but
  testis "\n"
  jumpfalse whitespace
  a+ 

 #--------------
 whitespace:
  testclass [:space:] 
  jumpfalse openbrace
    clear
    jump 0 
 #---------------
 # We can ellide all these single character tests, because
 # the stack token is just the character itself with a *

 openbrace:
  testis "{" 
  jumpfalse closebrace 
    put
    add "*"
    push
    jump parse 

 #---------------
 closebrace:
  testis "}" 
  jumpfalse exclamation 
    put
    add "*"
    push
    jump parse 

 #---------------
 # format: !
 # the exclamation mark will be used to negate "class" tests
 # (and possibly other tests as well). eg ![a-z] { nop; }
 exclamation:
  testis "!" 
  jumpfalse semicolon 
    put
    add "*"
    push
    jump parse 
 #---------------
 # format: ; 
 # we just put the character itself on the stack as a token
 # in the form ;*  (char delimiter)
 semicolon:
  testis ";" 
  jumpfalse comma 
    put 
    add "*"
    push
    jump parse 

 #---------------
 # commas will be used to combined testis tests for blocks 
 # 
 comma:
  testis "," 
  jumpfalse quotetext 
    put 
    add "*"
    push
    jump parse 
 #---------------
 # format: "text"
 quotetext:
  testis "\"" 
  jumpfalse singlequotetext 
    until "\""
    put
    clear
    add "quote*"
    push 
    jump parse 
 #---------------
 # format: 'text', this will be changed to "text"
 # there was a bug here if the text contained double quotes!!
 # need to use "escape" command
 singlequotetext:
  testis "'" 
  jumpfalse class
    clear 
    # need to juggle embedded double quotes here
    # and escape them.
    until "'"
    clip
    escape "\"" 
    put
    clear
    add "\""
    get
    add "\""
    put
    clear
    add "quote*"
    push 
    jump parse 
 #---------------
 # formats: [:space:] [a-z] [abcd] [:alpha:] etc 
 class:
  testis "[" 
  jumpfalse statetest 
    until "]"
    put
    clear
    add "class*"
    push 
    jump parse 
 #---------------
 # formats: (eof) etc 
 statetest:
  testis "(" 
  jumpfalse comment
    clear
    until ")"
    clip
    put
    clear
    add "state*"
    push 
    jump parse 

 #---------------
 # multiline and single line comments. 
 comment:
  testis "#" 
  jumpfalse nocomment
    clear
    read
    testis "\n" 
    jumpfalse multiline
    clear
    jump 0 
 #----------------------------
 # checking for multiline comments of the form "#* \n\n\n *#"
 # these are just ignored at the moment (deleted) 
 multiline:
    testis "*" 
    jumpfalse oneline 
    clear
    ll 
    put
    clear
    until "*#"
    # make unterminated multiline comments an error, to make
    # debugging easier
    testends "*#"
    jumptrue multiline.terminated 
      clear
      add "unterminated multiline comment (#* ... *#) \n"
      add "starting at line "
      get 
      add " \n"
      add "[also see until bug in gh.c] \n"
      print
      clear
      quit

multiline.terminated:
    clear
    jump 0 
 oneline:
    until "\n" 
 #  print
    clear 
    jump 0
 nocomment:
 #----------------------------------
 # parse command words (and abbreviations)

 # legal characters for keywords (commands)
 testclass [abcdefghijklmnopqrstuvwxyzBEKGPRWS+-<>0^.]
 jumptrue alpha 
   # error message about a misplaced character
   put
   clear
   add "!! Misplaced character '"
   get
   add "' in script near line "
   ll
   add " (character "
   cc
   add ") \n"
   print
   clear
   bail
   # jump parse 
alpha:
   #while [:alpha:]
   # my testclass implementation cannot handle complex lists
   # eg [a-z+-] this is why I have to write out the whole alphabet
   while [abcdefghijklmnopqrstuvwxyzBEKGPRWS+-<>0^.]
   #----------------------------------
   # KEYWORDS 
   # here we can test for all the keywords (command words) and their
   # abbreviated one letter versions (eg: clip k, clop K etc). Then
   # we can print an error message and abort if the word is not a 
   # legal keyword for the parse-edit language

   #------------ 
   testis "B" 
   jumptrue prefix
   #------------ 
   testis "E" 
   jumptrue prefix 

   #------------ 
   testis "add" 
   jumptrue keyword 
   testis "a" 
   jumpfalse 4
   clear
   add "add"
   jumptrue keyword 

   #------------ 
   testis "clip" 
   jumptrue keyword 
   testis "k" 
   jumpfalse 4
   clear
   add "clip"
   jumptrue keyword 
   #------------ 
   testis "clop" 
   jumptrue keyword 
   testis "K" 
   jumpfalse 4
   clear
   add "clop"
   jumptrue keyword 
   #------------ 
   testis "replace" 
   jumptrue keyword 
   testis "D" 
   jumpfalse 4
   clear
   add "replace"
   jumptrue keyword 
   #------------ 
   testis "clear" 
   jumptrue keyword 
   testis "d" 
   jumpfalse 4
   clear
   add "clear"
   jumptrue keyword 
   #------------ 
   testis "print" 
   jumptrue keyword 
   testis "t" 
   jumpfalse 4
   clear
   add "print"
   jumptrue keyword 
   #------------ 
   testis "pop" 
   jumptrue keyword 
   testis "p" 
   jumpfalse 4
   clear
   add "pop"
   jumptrue keyword 
   #------------ 
   testis "push" 
   jumptrue keyword 
   testis "P" 
   jumpfalse 4
   clear
   add "push"
   jumptrue keyword 
   #------------ 
   testis "put" 
   jumptrue keyword 
   testis "G" 
   jumpfalse 4
   clear
   add "put"
   jumptrue keyword 
   #------------ 
   testis "get" 
   jumptrue keyword 
   testis "g" 
   jumpfalse 4
   clear
   add "get"
   jumptrue keyword 
   #------------ 
   testis "swap" 
   jumptrue keyword 
   testis "x" 
   jumpfalse 4
   clear
   add "swap"
   jumptrue keyword 
   #------------ 
   testis "++" 
   jumptrue keyword 
   testis ">" 
   jumpfalse 4
   clear
   add "++"
   jumptrue keyword 
   #------------ 
   testis "--" 
   jumptrue keyword 
   testis "<" 
   jumpfalse 4
   clear
   add "--"
   jumptrue keyword 
   #------------ 
   testis "swap" 
   jumptrue keyword 
   testis "x" 
   jumpfalse 4
   clear
   add "swap"
   jumptrue keyword 
   #------------ 
   testis "read" 
   jumptrue keyword 
   testis "r" 
   jumpfalse 4
   clear
   add "read"
   jumptrue keyword 
   #------------ 
   testis "until" 
   jumptrue keyword 
   testis "R" 
   jumpfalse 4
   clear
   add "until"
   jumptrue keyword 
   #------------ 
   testis "while" 
   jumptrue keyword 
   testis "w" 
   jumpfalse 4
   clear
   add "while"
   jumptrue keyword 
   #------------ 
   testis "whilenot" 
   jumptrue keyword 
   testis "W" 
   jumpfalse 4
   clear
   add "whilenot"
   jumptrue keyword 
   #------------ 
   testis "jump" 
   jumptrue keyword 
   testis "," 
   jumpfalse 4
   clear
   add "jump"
   jumptrue keyword 
   #------------ 
   testis "jumptrue" 
   jumptrue keyword 
   testis "j" 
   jumpfalse 4
   clear
   add "jumptrue"
   jumptrue keyword 
   #------------ 
   testis "jumpfalse" 
   jumptrue keyword 
   testis "J" 
   jumpfalse 4
   clear
   add "jumpfalse"
   jumptrue keyword 
   #------------ 
   testis "testis" 
   jumptrue keyword 
   testis "=" 
   jumpfalse 4
   clear
   add "testis"
   jumptrue keyword 
   #------------ 
   testis "testclass" 
   jumptrue keyword 
   testis "?" 
   jumpfalse 4
   clear
   add "testclass"
   jumptrue keyword 
   #------------ 
   testis "testbegins" 
   jumptrue keyword 
   testis "b" 
   jumpfalse 4
   clear
   add "testbegins"
   jumptrue keyword 
   #------------ 
   testis "testends" 
   jumptrue keyword 
   testis "B" 
   jumpfalse 4
   clear
   add "testends"
   jumptrue keyword 
   #------------ 
   testis "testeof" 
   jumptrue keyword 
   testis "E" 
   jumpfalse 4
   clear
   add "testeof"
   jumptrue keyword 
   #------------ 
   testis "testtape" 
   jumptrue keyword 
   testis "*" 
   jumpfalse 4
   clear
   add "testtape"
   jumptrue keyword 
   #------------ 
   testis "count" 
   jumptrue keyword 
   testis "n" 
   jumpfalse 4
   clear
   add "count"
   jumptrue keyword 
   #------------ 
   testis "a+" 
   jumptrue keyword 
   testis "+" 
   jumpfalse 4
   clear
   add "a+"
   jumptrue keyword 
   #------------ 
   testis "a-" 
   jumptrue keyword 
   testis "-" 
   jumpfalse 4
   clear
   add "a-"
   jumptrue keyword 
   #------------ 
   testis "zero" 
   jumptrue keyword 
   testis "0" 
   jumpfalse 4
   clear
   add "zero"
   jumptrue keyword 
   #------------ 
   testis "cc" 
   jumptrue keyword 
   testis "c" 
   jumpfalse 4
   clear
   add "cc"
   jumptrue keyword 
   #------------ 
   testis "ll" 
   jumptrue keyword 
   testis "l" 
   jumpfalse 4
   clear
   add "ll"
   jumptrue keyword 
   #------------ 
   testis "escape" 
   jumptrue keyword 
   testis "^" 
   jumpfalse 4
   clear
   add "escape"
   jumptrue keyword 
   #------------ 
   testis "unescape" 
   jumptrue keyword 
   testis "v" 
   jumpfalse 4
   clear
   add "unescape"
   jumptrue keyword 
   #------------ 
   testis "state" 
   jumptrue keyword 
   testis "S" 
   jumpfalse 4
   clear
   add "state"
   jumptrue keyword 
   #------------ 
   testis "quit" 
   jumptrue keyword 
   testis "q" 
   jumpfalse 4
   clear
   add "quit"
   jumptrue keyword 
   #------------ 
   testis "bail" 
   jumptrue keyword 
   testis "Q" 
   jumpfalse 4
   clear
   add "bail"
   jumptrue keyword 
   #------------ 
   testis "write" 
   jumptrue keyword 
   testis "s" 
   jumpfalse 4
   clear
   add "write"
   jumptrue keyword 
   #------------ 
   testis "nop" 
   jumptrue keyword 
   testis "o" 
   jumpfalse 4
   clear
   add "nop"
   jumptrue keyword 

   #------------ 
   # the ".restart" command just jumps to instruction 0, which
   # is usually "read". 
check.restart:
   testis ".restart" 
   jumpfalse check.reparse 
     clear
     add "jump 0"
     put
     clear
     add "command*"
     push
   jump parse 

   #------------ 
   # the .reparse command and "parse label" is a simple way to 
   # make sure that all shift-reductions occur. It should be used inside
   # a block test, so as not to create an infinite loop.
check.reparse:

   testis ".reparse" 
   jumpfalse check.parselabel 
     clear
     add "jump parse"
     put
     clear
     add "command*"
     push
   jump parse 

check.parselabel:
   testis "parse>" 
   jumpfalse check.beginlabel 
     clear
     add "parse:"
     put
     clear
     add "command*"
     push
   jump parse 

check.beginlabel:
   testis "begin" 
   jumpfalse error.unknown.command 
     put
     add "*"
     push
   jump parse 

#------------ 

error.unknown.command:
     add " <unknown command!> on line "
     ll 
     add " of source file. \n"
     print
     clear
     quit

keyword:
   put
   clear
   add "word*"
   push
   jump parse
   
# the b and e prefixes are used to indicate a beginswith test
# or an endswith test
prefix:
   add "*"
   push
   jump parse

# ----------------------------------
# PARSING PHASE:
# the lexing phase finishes here, and below is the 
# parse/compile phase of the script. Here we pop tokens 
# off the stack and check for sequences of tokens eg word*semicolon*
# If we find a valid series of tokens, we "shift-reduce" or "resolve"
# The token series eg word*semicolon* --> command*
#
# At the same time, we manipulate (transform) the attributes on the 
# tape, as required. So Tape=|pop|;| becomes |\npop| where the 
# bars | indicate tape cells. (2 tapes cells are merged into 1).
#
# In this section we also have to manipulate the tape (to actually "compile"
# the script). Each time the stack is reduced, the tape must also be reduced
# 
parse:

#-------------------------------------
# 2 tokens
#-------------------------------------
   pop
   pop

check.error.two.tokens:
   testis "word*word*"
   jumptrue error.two.tokens 
   testis "quote*word*"
   jumptrue error.two.tokens 
   testis "quote*class*"
   jumptrue error.two.tokens 
   testis "quote*state*"
   jumptrue error.two.tokens 
   testis "quote*}*"
   jumptrue error.two.tokens 
   testis "quote*begintext*"
   jumptrue error.two.tokens 
   testis "quote*endtext*"
   jumptrue error.two.tokens 
   testis "class*word*"
   jumptrue error.two.tokens 
   testis "class*quote*"
   jumptrue error.two.tokens 
   testis "class*class*"
   jumptrue error.two.tokens 
   testis "class*state*"
   jumptrue error.two.tokens 
   testis "class*}*"
   jumptrue error.two.tokens 
   testis "class*begintext*"
   jumptrue error.two.tokens 
   testis "class*endtext*"
   jumptrue error.two.tokens 
   testis "word*}*"
   jumptrue error.two.tokens 
   testis "word*begintext*"
   jumptrue error.two.tokens 
   testis "word*endtext*"
   jumptrue error.two.tokens 
   jump check.error.badsemicolon

error.two.tokens:
   push 
   push
   add "error near line " 
   ll
   add " of script (missing semicolon?) \n"
   print
   clear
   quit
    
check.error.badsemicolon:
   testis "{*;*"
   jumptrue error.badsemicolon 
   testis ";*;*"
   jumptrue error.badsemicolon 
   testis "}*;*"
   jumptrue error.badsemicolon 
   jump check.error.emptybrackets 

error.badsemicolon:
   push
   push
   add "error near line " 
   ll
   add " of script: misplaced semi-colon ; \n"
   print
   clear
   quit

check.error.emptybrackets:
   testis "{*}*"
   jumpfalse two.tokens
     push
     push
     add "error near line " 
     ll
     add " of script: empty braces {} \n"
     print
     clear
     quit

two.tokens:

#-----------------------------------------
# format: !"text"  
#  This format is used to indicate a negative class before a 
#  a brace block.
check.exclamation.class:
   testis "!*class*"
   jumpfalse check.e.quote

exclamation.class:
   clear
   add "notclass*"
   push
   get
   -- 
   put
   ++
   clear
   jump parse

#-----------------------------------------
# format: E"text" or E'text'
#  This format is used to indicate a "workspace-ends-with" text before
#  a brace block.
check.e.quote:
   testis "E*quote*"
   jumpfalse check.b.quote

e.quote:
   clear
   add "endtext*"
   push
   get
   -- 
   put
   ++
   clear
   jump parse
   
#-----------------------------------------
# format: B"sometext" or B'sometext' 
#   A 'B' preceding some quoted text is used to indicate a 
#   'workspace-begins-with' test, before a brace block.
check.b.quote:
   testis "B*quote*"
   jumpfalse check.word.colon 

b.quote:
   clear
   add "begintext*"
   push
   get
   -- 
   put
   ++
   clear
   jump parse
   
#--------------------------------------------
# bnf: command <- word ; 
# formats: "pop; push; clear; print; " etc
check.word.colon:

   testis "word*;*"
   jumpfalse commandset.command

   clear
   #-----------------------
   # check if "add", "until" etc 
   get 
   testis "add"
   jumptrue missing.parameter
   testis "until"
   jumptrue missing.parameter
   testis "while"
   jumptrue missing.parameter
   testis "whilenot"
   jumptrue missing.parameter
   testis "escape"
   jumptrue missing.parameter
   testis "unescape"
   jumptrue missing.parameter
   testis "replace"
   jumptrue missing.parameter
   jump ok.command

missing.parameter:
   add ": command needs an argument, on line: " 
   ll
   print
   clear
   quit
ok.command:
   clear
   add "command*"
   #-------------
   # no need to format tape cells because current cell contains word
   push
   jump parse

#-----------------------------------------
# bnf: commandset <= command command
# 
commandset.command:
   testis "command*command*"
   jumptrue yes.commandset.command
   testis "commandset*command*"
   jumptrue yes.commandset.command
   jump three.tokens

yes.commandset.command:
   clear
   add "commandset*"
   push
   #---------------------------
   # format the tape attributes. Add the next command on a newline 
   --
   get
   add "\n"
   ++
   get
   -- 
   put
   ++
   clear
   jump parse
   
#-------------------
# 3 tokens
#-------------------
three.tokens:

   pop

  
check.error.three.tokens:
   # aid debugging by trying to trap parse errors as they occur 
   # tokens to consider when trapping errors:
   #    word* quote* quoteset* class* notclass* begintext*
   #    endtext* state* !* ;* ,*
   # But there are a lot of combinations of all these token types
   # which makes the error test block long. Perhaps there should
   # be a better way. yes there is: see the notes above
   
   testis "word*quote*{*"
   jumptrue error.three.tokens 
   testis "class*quote*{*"
   jumptrue error.three.tokens 
   testis "notclass*quote*{*"
   jumptrue error.three.tokens 
   testis "word*class*{*"
   jumptrue error.three.tokens 
   # etc etc etc
   jump check.quote.comma.quote

error.three.tokens:
   # preserve the erroneous tokens for the sake of 
   # debugging.
   push 
   push
   push
   add "error near line " 
   ll
   add " of script (missing semicolon?) \n"
   print
   clear
   quit
    

#-----------------------------------------
# bnf: quoteset <- quote , quote  or quoteset , quoteset
# This allows multiple "testis" tests for one block which is 
# very useful. But this is very tricky to compile because we
# dont know the jump target yet. Maybe we need a new machine instruction
# trepace <text>" which replaces the text with the context of the 
# tape cell. Instead I will just make a simple parse with 
#   quote , quote block to atleast allow 2 rules per block.

# this is very incomplete, need to do quoteset,quote and 
# also quoteset{ commandset } 
# also the character label numbers arent going to work here.
check.quote.comma.quote:
   testis "quote*,*quote*"
   jumpfalse check.quoteset.comma.quote 

quote.comma.quote:
   clear
   add "testis "
   get
   # just jump over the next test
   add "\njumptrue 3 \n"
   ++
   ++
   add "testis "
   get
   add "\n"
   # add the next jumptrue when the next quote is found
   --
   --
   put
   clear
   add "quoteset*"
   push
   # always reparse/compile
   jump parse
   

# quoteset <- quoteset , quote
check.quoteset.comma.quote:
   testis "quoteset*,*quote*"
   jumpfalse check.word.quote 

quoteset.comma.quote:
   clear
   get
   ++
   ++
   add "jumptrue 4 \n "
   add "jumptrue 3 \n "
   add "testis "
   get
   add "\n"

   # add the next jumptrue when/if the next quote is found
   --
   --
   put
   clear
   add "quoteset*"
   push
   # always reparse/compile
   jump parse
   
#--------------------------------------------
# bnf: command <= keyword quoted-text semi-colon
# format: add "text";
check.word.quote:

   testis "word*quote*;*"
   jumpfalse word.class.colon

   clear
   get 
   testis "add"
   jumptrue good.quote.command
   testis "until"
   jumptrue good.quote.command
   testis "while"
   jumptrue good.quote.command
   testis "whilenot"
   jumptrue good.quote.command
   testis "escape"
   jumptrue good.quote.command
   testis "unescape"
   jumptrue good.quote.command
   testis "replace"
   jumptrue error.missing.second.parameter
   #------
   # error 
   add ": command does not take an argument \n" 
   add "near line "
   ll
   add " of script. \n"
   print
   clear
   #state
   quit

error.missing.second.parameter:
   #------
   # error 
   add ": command requires 2 parameters, not 1 \n" 
   add "near line "
   ll
   add " of script. \n"
   print
   clear
   quit

good.quote.command:
   clear
   add "command*"
   push
   #---------------------------
   # a command plus argument, eg add "this" 
   --
   get
   add " "
   ++
   get
   -- 
   put
   ++
   clear
   jump parse

#----------------------------------
# parsing format: "while [:alpha:] ;" 
word.class.colon:
   testis "word*class*;*"
   jumpfalse four.tokens

   clear
   get 
   testis "while"
   jumptrue good.class.command
   testis "whilenot"
   jumptrue good.class.command
   #------
   # error 
   add ": command cannot have a class argument \n" 
   add "line "
   ll  
   add ": error in script \n"
   print
   clear
   quit

good.class.command:
   clear
   add "command*"
   push
   #---------------------------
   # a command plus argument, eg add "this" 
   --
   get
   add " "
   ++
   get
   -- 
   put
   ++
   clear
   jump parse


   jump 4.tokens

# -------------------------------
# 4 tokens
# -------------------------------
four.tokens:

   pop

#-------------------------------------
# bnf: command <- replace quote quote ";"
# format:  replace "and" "AND" ; 

check.replace.command:
   testis "word*quote*quote*;*"
   jumpfalse check.begin.block

   clear
   #-----------------------
   # check is replace 
   get 
   testis "replace"
   jumpfalse error.word.two.tokens

   clear
   add "command*"
   push
   #---------------------------
   # a command plus 2 arguments, eg replace "this" "that"
   --
   get
   add " "
   ++
   get
   add " "
   ++
   get
   -- 
   --
   put
   ++
   clear
   jump parse

error.word.two.tokens:
   add " << command does not take 2 quoted arguments. \n"
   add " on line "
   ll
   add " of script.\n"
   quit


#-------------------------------------
# format:  begin { #* commands *# }
# begin blocks which are similar to awks begin blocks.

check.begin.block:
   testis "begin*{*commandset*}*"
   jumptrue is.begin.block
   testis "begin*{*command*}*"
   jumptrue is.begin.block
   jump check.quote.block

is.begin.block:
   clear
   ++
   ++
   get 
   --
   --
   put
   clear
   add "beginblock*"
   push
   jump parse

#-------------------------------------
# bnf: command <- quote "{" commandset "}" 
# bnf: command <- quote "{" command "}" 
# format:  "text" { add ":"; print; clear; }
# format:  "text" { print; }

check.quote.block:
   testis "quote*{*commandset*}*"
   jumptrue is.quote.block
   testis "quote*{*command*}*"
   jumptrue is.quote.block
   jump check.quoteset.block

is.quote.block:
   clear
   # ----------
   # compile to assembly
   add "testis "
   get
   add "\njumpfalse not.text."
   cc
   add "\n"
   ++
   ++
   get 
   add "\nnot.text."
   cc
   --
   --
   add ":"
   put
   clear

   add "command*"
   push
   # always reparse/compile
   jump parse

#-------------------------------------
# bnf: command <- quoteset "{" commandset "}" 
# bnf: command <- quoteset "{" command "}" 
# format:  "text","more","and" { add ":"; print; clear; }
# format:  "text","more","and" { print; }

check.quoteset.block:
   testis "quoteset*{*commandset*}*"
   jumptrue quoteset.block
   testis "quoteset*{*command*}*"
   jumptrue quoteset.block
   jump check.testbegins.block

quoteset.block:
   clear
   # quoteset compile differently to quote blocks
   get
   add "\njumpfalse not.quoteset.text."
   cc
   add "\n"
   ++
   ++
   get 
   add "\nnot.quoteset.text."
   cc
   --
   --
   add ":"
   put
   clear

   add "command*"
   push
   # always reparse/compile
   jump parse

#-------------------------------------
# bnf: command <- begintext { commandset } 
# bnf: command <- begintext { command } 
# format:  ."text" { add ":"; print; clear; }
# format:  ."text" { print; }
check.testbegins.block:
   testis "begintext*{*commandset*}*"
   jumptrue testbegins.block
   testis "begintext*{*command*}*"
   jumptrue testbegins.block
   jump check.testends.block

testbegins.block:
   clear
   # ----------
   # compile to assembly
   add "testbegins "
   get
   add "\njumpfalse not.begins."
   cc
   add "\n"
   ++
   ++
   get 
   add "\nnot.begins."
   cc
   --
   --
   add ":"
   put
   clear
   add "command*"
   push
   # always reparse/compile
   jump parse


#-------------------------------------
# bnf: command <- endtext { commandset } 
# bnf: command <- endtext { command } 
# format:  ,"text" { add ":"; print; clear; }
# format:  ,"text" { print; }
check.testends.block:
   testis "endtext*{*commandset*}*"
   jumptrue testends.block
   testis "endtext*{*command*}*"
   jumptrue testends.block
   jump check.class.block

testends.block:
   clear
   # ----------
   # compile to assembly
   add "testends "
   get
   add "\njumpfalse not.ends."
   cc
   add "\n"
   ++
   ++
   get 
   add "\nnot.ends."
   cc
   --
   --
   add ":"
   put
   clear
   add "command*"
   push
   # always reparse/compile
   jump parse

#-------------------------------------
# formats: [:alpha:] { add ":"; print; clear; }
#          [a-z] { print; }
#   this syntax pattern executes a brace-block of commands if
#   the single letter in the workspace is in a particular
#   character class, list or range.
check.class.block:
   testis "class*{*commandset*}*"
   jumptrue class.block
   testis "class*{*command*}*"
   jumptrue class.block
   jump check.notclass.block 

class.block:
   clear
   # ----------
   # compile to assembly
   add "testclass "
   get
   add "\njumpfalse not.in.class."
   cc
   add "\n"
   ++
   ++
   get 
   add "\nnot.in.class."
   cc
   #add "XX"
   --
   --
   add ":"
   put
   clear
   add "command*"
   push
   jump parse

#-------------------------------------
# formats:  ![:alpha:] { add ":"; print; clear; }
#           ![a-z] { print; }
#   this syntax pattern executes a brace-block of commands if
#   the single letter in the workspace is not in a particular
#   character class, list or range.
check.notclass.block:
   testis "notclass*{*commandset*}*"
   jumptrue notclass.block
   testis "notclass*{*command*}*"
   jumptrue notclass.block
   jump eof.block 

notclass.block:
   clear
   # ----------
   # compile to assembly
   add "testclass "
   get
   add "\njumptrue is.in.class."
   cc
   add "\n"
   ++
   ++
   get 
   add "\nis.in.class."
   cc
   --
   --
   add ":"
   put
   clear
   add "command*"
   push
   jump parse

#-------------------------------------
# format:  [:alpha:] { add ":"; print; clear; }
# format:  [a-z] { print; }

eof.block:
   testis "state*{*commandset*}*"
   jumptrue is.state.block
   testis "state*{*command*}*"
   jumptrue is.state.block
   jump again

is.state.block:
   clear
   get
   testis "eof"
   jumpfalse check.tapetest
   clear
   # ----------
   # compile to eof and tape= test 
   add "testeof "
   add "\njumpfalse not.eof."
   cc
   add "\n"
   ++
   ++
   get 
   add "\nnot.eof."
   cc
   --
   --
   add ":"
   put
   clear
   add "command*"
   push
   jump parse

check.tapetest:
   testis "=="
   jumpfalse error.state.block 
   clear
   # ----------
   # compile to tape= test 
   add "testtape "
   add "\njumpfalse not.testtape."
   cc
   add "\n"
   ++
   ++
   get 
   add "\nnot.testtape."
   cc
   --
   --
   add ":"
   put
   clear
   add "command*"
   push
   jump parse

error.state.block:
  
   add ": unknown state test near line "
   ll
   add " of script.\n"
   add " State tests are \n"
   add "   (eof) test if end of stream reached. \n"
   add "   (==)  test if workspace is same as current tape cell \n"
   print
   clear
   quit

again:
   push 
   push
   push
   push

   jump 0

#--- end of parse/compile assembler script