#* 
   translate.perl.pss 

   This is a parse-script which translates parse-scripts into perl
   code, using the 'pep' tool. The script creates a standalone 
   perl program which behaves exactly like the pep/nom script. 
   
   A number of other translators exist for pep/nom, including 
   java, go, python, ruby, c, tcl, javascript. 

   The virtual machine and engine is implemented in plain c at
   http://bumble.sf.net/books/pars/object/pep.c. This implements a script
   language with a syntax reminiscent of sed and awk (much simpler than
   awk, but more complex than sed). The syntax for the pep/nom language
   is implemented in the file "compile.pss". The system is self hosting
   and all translation scripts are self compilable. Is this a first 
   in the world of software?
   
   This version simplifies the test grammar and implements "parse()" methods

STATUS

  Aug 2022: very simple scripts working. 

NOTES
   
   Allows multiple chars for escape/unescape but not implemented in
   pep.c

   Double negs can be caught at the !*!* token stage. Dont need a 
   nottest* token

   This script can also act as an interpreter if the script is 
   run on itself to produce a perl prog. Then use the interpret() method
   read from a file (with an -f switch for example) because stdin is 
   required for the input stream. This is magic and requires deep
   thought to understand.

   It could construct a new parse method eg "runParse()" that opens the file
   specified in the -f switch and then does "eval('$mm->runParse()');" run
   parse will read from <stdin> and write to <stdout> by default.

   I have "reformed" this grammar, which was originally written for 
   calculating absolute jumps in the "assembler" output code in
   compile.pss. This is not necessary here, and complicates matters.

   I thought that and exp* token would be necessary for expressions
   like (B"a",B"b").E"c" but I can just reduce this to test* which
   simplifies matters.

   * some BNF rules for a simplified test grammar
   --------
     !*class* -> !*test*
     !*class* -> !*test*
     !*quote* -> !*test*

     !*test* -> test*

     E*quote* -> test*
     B*quote* -> test*

     "(eof)" -> test*
     "(==)" -> test*

     quote*,* -> test*,*
     quote*.* -> test*.*
     quote*{* -> test*{*

     class*,* -> test*,*
     class*.* -> test*.*
     class*{* -> test*{*

     test*,*test* -> orlist*
     orlist*,*test* -> orlist*
     test*.*test* -> andlist*
     andlist*.*test* -> andlist*

     (*andlist*)* -> test*
     (*orlist*)* -> test*
     (*test*)* -> test*

   ,,,,

   some strange bug with quote parsing in the compiler compile.pss
   seen in the parse(""); 

   man perlfunc is a handy reference for perl functions
   perl has a "goto"

   In other translation scripts, we use labelled loops and
   break/continue to implement the parse> label and .reparse .restart
   commands. Breaks could be used to implement the quit command but aren't.

   Does perl support labelled loops? No, one option is to set a flag
   when .restart .reparse is called to break the outer loop

   We can use "run once" loops eg " while true do ... break; end " an 
   example is in the translate.tcl.pss script. 

TODO

SEE ALSO
   
   At http://bumble.sf.net/books/pars/

   tr/translate.java.pss, tr/translate.py.pss
     similar scripts for compiling scripts into java and python
     (also go, ruby, c, tcl, javascript).

   compile.pss
     compiles a script into an "assembly" format that can be loaded
     and run on the parse-machine with the -a  switch. This performs
     the same function as "asm.pp" 

PERL SYNTAX

 * cli options
 -----
   use Getopt::Std;
   our $opt_r; # will be set to its value if used.
   getopts('r:'); # -r has an option.
 ,,,


 * open and read from file
 -----
  open F, "<file.txt" or die $!;
  while (<F>) { print $_; }
  close F;
 ,,,

 * read from a string as if it were a file
 -----
  my $foo = "abc\ndef\n";
  open my $fh, "<", \$foo;
  while (<$fh>) {
    print "line $.: $_";
  }
 ,,,

TEST SCRIPTS

  * test escaping
  >> pep.rb 'r;escape "c";t;d;' "abcxcabc"
  >> pep.rb 'r;unescape "c";t;d;' 'ab\cx\\cabc'

  so unescaping should be more intelligent.
  
TESTING

   * test simple scripts (using helper function), 1st and 2nd generation
   >> pep.tt perl

   * use helper to test 2nd gen script translation
   >> pep.plss "r;a':';t;d;" "abc"

   * use helper function in peprc to see palindromes
   >> pep.plff eg/palindrome.pss /usr/share/dict/words

   * use the bash helper functions to test (from helpers.pars.sh)
   >> pep.plf eg/json.check.pss '{"here":2}'

   The line above compiles the script to perl in the folder
   pars/eg/perl/json.check.pss and runs it with the input.

   * one comprehensive test is to run the script on itself
   >> pep -f translate.rb.pss translate.rb.pss > eg/perl/translate.rb.rb
   >> chmod a+x eg/perl/translate.perl.pl
   >> echo "r;t;d;" | eg/perl/translate.perl.pl
   
   I call this "2nd generation" script compilation.

GOTCHAS

  treatment of regexes is different (for while whilenot etc). Eg in
  ruby [[:space:]] is unicode aware but \s is not

  until needs to actually count how many trailing '\' or
  escape chars there are !...

  Make sure escaping and multiline arguments work.

  "until" may not read at least one character.

BUGS
   
  Many.

  2nd gen is not working  yet

SOLVED BUGS TO WATCH FOR 

  * the line below was throwing an error (compile.pss)
  >> add '", "\\'; get; add '")'; --; put; clear;

  This needed to be fixed in compile.pss and also in eg translation
  script.

  Multiline add should not add extra spaces,

  escape needs to use the machine escape char. 
  found and fixed a bug in java whilenot/while. The code exits if the 
  character is not found, which is not correct.

  Found and fixed a bug in the (==) code ie in java (stringa == stringb)
  doesnt work. 

  "until" bug where the code did not read 
  at least one character.

  Read must exit if at end of stream, but while/whilenot/until, no.

TASKS 

  Add an error code to quit.
  Add an "esc" command that changes the escape char (needed for 
  csv files for example).

HISTORY

  19 feb 2025
    Seems that there is no "until;" syntax in this script. Nor 
    "go;" which we may need for type checking.
  18 aug 2022
    Reformed the test* grammar, much better and simpler and also includes
    bracket expressions. Should copy this grammar to the other 
    translation scripts. Remove lots of tokens eg begintext* endtext*
    notbegintext* etc.

  17 aug 2022
    Wrote an interpret() method which should execute scripts
  16 aug 2022
    Wrote escapeChar and unescapeChar properly. These are "expensive"
    operations when the workspace is big. Wrote parse methods.
    Can read and write to strings now.
    Will try to change the grammar to be more logical. Eg, will 
    use tokens test* nottest* exp* quote*. So begintext* and endtext*
    should get ellided. Changed to parse method. Also
    will write "struct.syntax.txt" which will have all syntax structures.
    This will be useful for seeing how translators work.
    Saved an original version in tr/translate.perl.orig.pss
  15 aug 2022
    Debugging and testing. A few tests in tr/tr.test.txt now work 
  13 july 2022
    Converting from a normal perl hash, to an object oriented 
    perl idiom. (bless etc). Perl oo seems quite odd. Simplest script
    'r;t;d' is now working.
  25 june 2022
    A lot of conversion work.
  24 June 2022
    Began to adapt from the ruby code.

*#

  # the shebang line for perl must be very first in script so
  # I will print here so that it comes before the stack trace 
  # if one is printed.
  begin { add "#!/usr/bin/perl\n"; print; clear; }
  read;
  #--------------
  [:space:] {
    clear; .reparse
  }

  #---------------
  # We can ellide all these single character tests, because
  # the stack token is just the character itself with a *
  # Braces {} are used for blocks of commands, ',' and '.' for concatenating
  # tests with OR or AND logic. 'B' and 'E' for begin and end
  # tests, '!' is used for negation, ';' is used to terminate a 
  # command.
  # ( and ) may be used for a test expression grammar
  "(",")","{", "}", ";", ",", ".", "!", "B", "E" {
    put; add "*"; push; .reparse 
  }

  #---------------
  # format: "text"
  "\"" {
    # save the start line number (for error messages) in case 
    # there is no terminating quote character.
    clear; add "line "; lines; add ", char "; chars; 
    put; clear; 
    until '"'; 
    !E'"' { 
      clear; add 'Unterminated quote character (") starting at ';
      get; add ' !\n'; 
      print; quit;
    }
    # check for empty quotes as arguments for escape etc 
    # in perl all tests and replace use regexes, so we dont need
    # quotes around, also need \\Q to stop metacharacters making 
    # problems for the regex (eg ^$.{} etc), but not here because
    # "add" doesn't need it.
    clip; put; clear;
    add "quote*"; push;
    .reparse 
  }

 #---------------
 # format: 'text', single quotes are converted to double quotes
 # but we must escape embedded double quotes.
  "'" {
    # save the start line number (for error messages) in case 
    # there is no terminating quote character.
    clear; add "line "; lines; add " (character "; chars; add ") ";
    put; clear;
    until "'"; 
    !E"'" { 
      clear; add "Unterminated quote (') starting at ";
      get; add '!\n'; 
      print; quit;
    }
    # all tests and replacements are regexes, but "add" is not
    clip; unescape "'";
    put; clear; add "quote*";
    push; .reparse 
  }

  #---------------
  # formats: [:space:] [a-z] [abcd] [:alpha:] etc 
  # should class tests really be multiline??!
  "[" {
    # save the start line number (for error messages) in case 
    # there is no terminating bracket character.
    clear; add "line "; lines; add " (character "; chars; add ") ";
    put; clear; add "[";
    until "]"; 
    "[]" {
      clear; add "pep script error at line "; lines;
      add " (character "; chars; add "): \n";
      add "  empty character class [] \n";
      print; quit;
    }
    !E"]" { 
      clear; add "Unterminated class text ([...]) starting at "; get; 
      add "
      class text can be used in tests or with the 'while' and 
      'whilenot' commands. For example: 
        [:alpha:] { while [:alpha:]; print; clear; }
      ";
      print; quit;
    }

    # need to escape quotes? ruby uses /.../ to match
    escape '"';
    # the caret is not a negation operator in pep scripts
    replace "^" "\\^";

    # save the class on the tape
    put;
    clop; clop;
    !B"-" {
      # not a range class, eg [a-z] so need to escape '-' chars
      clear; get; replace '-' '\\-'; put;
    }
    B"-" {
      # a range class, eg [a-z], check if it is correct
      clip; clip; 
      !"-" {
        clear;
        add "[pep script error] near line "; lines;
        add ", char "; chars; add ": \n";
        add " Incorrect character range class "; get;
        add "
   For example:
     [a-g]  # correct
     [f-gh] # error! \n";
        print; clear; quit;

      }
    }
    clear; get;  # restore class text
    B"[:".!E":]" { 
      clear; add "malformed character class starting at ";
      get; add '!\n'; 
      print; quit;
    }
    # class in the form [:digit:]
    B"[:".!"[:]" {
      clip; clip; clop; clop;
      # unicode posix character classes 
      # Also, abbreviations (not implemented in gh.c yet.)
      # this should not be tricky in ruby because posix is supported
      "alnum","N" { clear; add "[[:alnum:]]"; }
      "alpha","A" { clear; add "[[:alpha:]]"; }
      # ? can use s.ascii_only?()
      "ascii","I" { clear; add "[[:ascii:]]"; }
      # non-standard ruby posix class 'word'
      "word","W" { clear; add "[[:word:]]"; }
      "blank","B" { clear; add "[[:blank:]]"; }
      "cntrl","C" { clear; add '[[:cntrl:]]'; }
      "digit","D" { clear; add "[[:digit:]]"; }
      "graph","G" { clear; add '[[:graph:]]'; }
      "lower","L" { clear; add '[[:lower:]]'; }
      "print","P" { clear; add "[[:print:]]"; }
      "punct","T" { clear; add '[[:punct:]]'; }
      "space","S" { clear; add "[[:space:]]"; }
      "upper","U" { clear; add '[[:upper:]]'; }
      "xdigit","X" { clear; add "[[:xdigit:]]"; }

      !B"[[" {
        put; clear;
        add "pep script error at line "; lines;
        add " (character "; chars; add "): \n";
        add "Unknown character class '"; get; add "'\n";
        print; clear; quit;
      }

    }

    put; 
    clear;
    add '/^'; get; add '+$/'; put;
    clear;
    # add quotes around the class and limits around the 
    # class so it can be used with the string.matches() method
    # (must match the whole string, not just one character)
    add "class*"; push;
    .reparse 
  }

  #---------------
  # multiline and single line comments, eg #... and #* ... *#
  "#" {
    clear; read;
    "\n" { clear; .reparse }

    # checking for multiline comments of the form "#* \n\n\n *#"
    # these are just ignored at the moment (deleted) 
    "*" { 
      # save the line number for possible error message later
      clear; lines; put; clear;
      until "*#"; 
      E"*#" {
        # convert to python comments (#), python doesnt have multiline
        # comments, as far as I know
        clip; clip; replace "\n" "\n#"; 
        put; clear; 
        # create a "comment" parse token
        # comment-out this line to remove multiline comments from the 
        # compiled python
        # add "comment*"; push; 
        .reparse  
      }
      # make an unterminated multiline comment an error
      # to ease debugging of scripts.
      clear; 
      add "unterminated multiline comment #* ... *# \n";
      add "stating at line number "; get; add "\n";
      print; clear;
      quit;
    }

    # single line comments. some will get lost.
    put; clear; add "#"; get; until "\n"; clip;
    put; clear; 
    # comment out this below to remove single line comments
    # from the output
    add "comment*"; push; 
    .reparse 
  }

 #----------------------------------
 # parse command words (and abbreviations)

 # legal characters for keywords (commands)
 ![abcdefghijklmnopqrstuvwxyzBEOFKGPRUWS+-<>0^=] {
   # error message about a misplaced character
   put; clear;
   add "[Pep syntax error] misplaced char '";
   get; add "' near line "; lines;
   add " char "; chars; add "\n";
   print; quit;
 }

   # my testclass implementation cannot handle complex lists
   # eg [a-z+-] this is why I have to write out the whole alphabet

   while [abcdefghijklmnopqrstuvwxyzBEOFKGPRUWS+-<>0^=];
   #----------------------------------
   # KEYWORDS 
   # here we can test for all the keywords (command words) and their
   # abbreviated one letter versions (eg: clip k, clop K etc). Then
   # we can print an error message and abort if the word is not a 
   # legal keyword for the parse-edit language

   # make ll an alias for "lines" and cc an alias for chars
   "ll" { clear; add "lines"; }
   "cc" { clear; add "chars"; }
   # one letter command abbreviations
   "a" { clear; add "add"; }
   "k" { clear; add "clip"; }
   "K" { clear; add "clop"; }
   "D" { clear; add "replace"; }
   "d" { clear; add "clear"; }
   "t" { clear; add "print"; }
   "p" { clear; add "pop"; }
   "P" { clear; add "push"; }
   "u" { clear; add "unstack"; }
   "U" { clear; add "stack"; }
   "G" { clear; add "put"; }
   "g" { clear; add "get"; }
   "x" { clear; add "swap"; }
   ">" { clear; add "++"; }
   "<" { clear; add "--"; }
   "m" { clear; add "mark"; }
   "M" { clear; add "go"; }
   "r" { clear; add "read"; }
   "R" { clear; add "until"; }
   "w" { clear; add "while"; }
   "W" { clear; add "whilenot"; }
   "n" { clear; add "count"; }
   "+" { clear; add "a+"; }
   "-" { clear; add "a-"; }
   "0" { clear; add "zero"; }
   "c" { clear; add "chars"; }
   "l" { clear; add "lines"; }
   "^" { clear; add "escape"; }
   "v" { clear; add "unescape"; }
   "z" { clear; add "delim"; }
   "S" { clear; add "state"; }
   "q" { clear; add "quit"; }
   "s" { clear; add "write"; }
   "o" { clear; add "nop"; }
   "rs" { clear; add "restart"; }
   "rp" { clear; add "reparse"; }
   # what about upper, lower and cap?

   # some extra syntax for testeof and testtape
   "eof","EOF","<eof>","<EOF>" { 
     clear; add "$self->{'eof'}"; 
     put; clear; add "test*"; push; .reparse 
   }

   "==","<==>" { 
     clear;
     add "$self->{'work'} eq ${$self->{'tape'}}[$self->{'cell'}]"; 
     put; clear; add "test*"; push; .reparse 
   }

   "jump","jumptrue","jumpfalse",
   "testis","testclass","testbegins","testends",
   "testeof","testtape" {
     put; clear;
     add "The instruction '"; get; add "' near line "; lines; 
     add " (character "; chars; add ")\n";
     add "can be used in pep assembly code but not scripts. \n";
     print; clear; quit;
   }
   
   # show information if these "deprecated" commands are used
   "Q","bail" {
     put; clear;
     add "The instruction '"; get; add "' near line "; lines; 
     add " (character "; chars; add ")\n";
     add "is no longer part of the pep language (july 2020). \n";
     add "use 'quit' instead of 'bail', and use 'unstack; print;' \n";
     add "instead of 'state'. \n";
     print; clear; quit;
   }
   
   "add","clip","clop","replace","upper","lower","cap","clear","print","state",
   "pop","push","unstack","stack","put","get","swap",
   "++","--","mark","go","read","until","while","whilenot",
   "count","a+","a-","zero","chars","lines","nochars","nolines",
   "escape","unescape","delim","quit",
   "write","nop","reparse","restart" {
     put; clear;
     add "word*";
     push; .reparse
   }
   
   #------------ 
   # the .reparse command and "parse label" is a simple way to 
   # make sure that all shift-reductions occur. It should be used inside
   # a block test, so as not to create an infinite loop. There is
   # no "goto" in java so we need to use labelled loops to 
   # implement .reparse/parse>

   "parse>" {
     clear; count;
     !"0" {
       clear; 
       add "script error:\n";
       add "  extra parse> label at line "; lines; add ".\n";
       print;
       quit;
     }
     clear; add "# parse> parse label"; put;
     clear; add "parse>*"; push;
     # use accumulator to indicate after parse> label
     a+; .reparse 
   }

   # --------------------
   # implement "begin-blocks", which are only executed
   # once, at the beginning of the script (similar to awk's BEGIN {} rules)
   "begin" {
     put; add "*"; push; .reparse 
   }

   add " << unknown command on line "; lines; 
   add " (char "; chars; add ")"; 
   add " of source file. \n"; 
   print; clear; quit;

# ----------------------------------
# PARSING PHASE:

# Below is the parse/compile phase of the script. Here we pop tokens off the
# stack and check for sequences of tokens eg "word*semicolon*". If we find a
# valid series of tokens, we "shift-reduce" or "resolve" the token series eg
# word*semicolon* --> command*
#
# At the same time, we manipulate (transform) the attributes on the tape, as
# required. 
#

parse>

 # for debugging, 
  add "# line "; lines; add " char "; chars; add ": "; print; clear; 
  unstack; print; stack; add "\n"; print; clear;

#-------------------------------------
# 2 tokens
#-------------------------------------
  pop; pop;

  # All of the patterns below are currently errors, but may not
  # be in the future if we expand the syntax of the parse
  # language. Also consider:
  #    andlist*, orlist*, test*, !* ,* ;* B* E*
  # It is nice to trap the errors here because we can emit some
  # (hopefully not very cryptic) error messages with a line number.
  # Otherwise the script writer has to debug with
  #   pep -a asm.pp -I scriptfile 
  #

  "word*word*","word*}*","word*!*","word*,*",
  "quote*word*","quote*class*","quote*}*",
  "class*word*","class*quote*","class*class*",
  "test*test*","test*quote*","quote*test*","class*}*" {
    add " (Token stack) \nValue: \n"; get; 
    add "\nValue: \n"; ++; get; --; add "\n";
    add "Error near line "; lines; add " (char "; chars; add ")"; 
    add " of pep script (missing semicolon?) \n";
    print; clear; 
    quit;
  }  

  # todo: use negative test for test* errors
  # eg what can follow a test* i.e ,* .* )* {*

  "test*!*","class*!*","quote*!*","andlist*!*","orlist*!*" {
    stack; 
    add "[Pep syntax error]\n near line "; lines;
    add ", char "; chars; add "\n";
    add '
    Reversed negation operator (!)? In Pep/nom syntax the negation 
    operator precedes the test it applies to, for example:
      !"a" { ... }  # correct
      "a"! { ... }  # error \n';
    print; quit;
  }  

  "{*;*", ";*;*", "}*;*" {
    push; push;
    add "[Pep syntax error] near line "; lines; add ", char "; chars;
    add " of pep script: misplaced semi-colon? ; \n";
    print; quit;
  }

  ",*{*",".*{*" {
    push; push;
    add "Error near line "; lines; add " (char "; chars; add ")"; 
    add " of script: extra comma or dot in test? \n";
    print; clear; quit;
  }

  "command*;*","commandset*;*" {
    push; push;
    add "Error near line "; lines; add " (char "; chars; add ")"; 
    add " of script: extra semi-colon? \n";
    print; clear; quit;
  }

  "!*!*" {
    push; push;
    add "[Pep syntax error] near line "; lines; add ", char "; chars; 
    add ": \n double negation '!!' achieves nothing \n";
    add " so please dont use it.\n";
    print; clear; quit;
  }

  "!*{*","!*;*" {
    push; push;
    add "error near line "; lines;
    add " (char "; chars; add ")"; 
    add " of script: misplaced negation operator (!)? \n";
    add " The negation operator precedes tests, for example: \n";
    add "   !B'abc'{ ... } or !(eof),!'abc'{ ... } \n";
    print; clear; quit;
  }

  ",*command*" {
    push; push;
    add "error near line "; lines;
    add " (char "; chars; add ")"; 
    add " of script: misplaced comma? \n";
    print; clear; quit;
  }

  "!*command*" {
    push; push;
    add "error near line "; lines;
    add " (at char "; chars; add ") \n"; 
    add " The negation operator (!) cannot precede a command \n";
    print; clear; quit;
  }

  ";*{*", "command*{*", "commandset*{*" {
    push; push;
    add "error near line "; lines;
    add " (char "; chars; add ")"; 
    add " of script: no test for brace block? \n";
    print; clear; quit;
  }

  "{*}*" {
    push; push;
    add "error near line "; lines;
    add " of script: empty braces {}. \n";
    print; clear; quit;
  }

  "B*class*","E*class*" {
    push; push;
    add "error near line "; lines;
    add " of script:\n  classes ([a-z], [:space:] etc). \n";
    add "  cannot use the 'begin' or 'end' modifiers (B/E) \n";
    print; clear; quit;
  }

  # or swap the comment before the test
  "comment*{*" {
    push; push;
    add "error near line "; lines;
    add " of script: comments cannot occur between \n";
    add " a test and a brace ({). \n";
    print; clear; quit;
  }

  "}*command*" {
    push; push;
    add "error near line "; lines;
    add " of script: extra closing brace '}' ?. \n";
    print; clear; quit;
  }

  #*
  E"begin*".!"begin*" {
    push; push;
    add "error near line "; lines;
    add " of script: Begin blocks must precede code \n";
    print; clear; quit;
  }
  *#

  #------------ 
  # The .restart command jumps to the first instruction after the
  # begin block (if there is a begin block), or the first instruction
  # of the script.
  ".*word*" {
    clear; ++; get; --;
    "restart" {
      clear; count;
      # this is the opposite of .reparse, using run-once loops
      # cant do next before label, infinite loop.
      # need to set flag variable. There is a subtlety: .restart can
      # exist without parse> (although it would be unusual). 

      # before (or without) the parse> label
      "0" { 
        clear; 
        # use the comment '# restart' so we can replace
        # this with 'break' if the parse> label appears later
        add "restart = $true; next; # restart"; 
      }   
      "1" { clear; add "break;"; }   # after the parse> label
      put; clear;
      add "command*";
      push; .reparse 
    }
    "reparse" {
      clear; count; 
      # check accumulator to see if we are in the "lex" block
      # or the "parse" block and adjust the .reparse compilation
      # accordingly.
      "0" { clear; add "break;"; }
      "1" { clear; add "next;"; }
      put; clear;
      add "command*";
      push; .reparse 
    }
    push; push;
    add "error near line "; lines;
    add " (char "; chars; add ")"; add " of script:  \n";
    add " misplaced dot '.' (use for AND logic or in .reparse/.restart \n";
    print; clear; quit;
  }

  #---------------------------------
  # Compiling comments so as to transfer them to the perl code
  "comment*command*","command*comment*","commandset*comment*" {
    clear; get; add "\n"; ++; get; --; put; clear;
    add "command*"; push; .reparse
  }

  "comment*comment*" {
    clear; get; add "\n"; ++; get; --; put; clear;
    add "comment*"; push; .reparse
  }

  # -----------------------
  # negated tokens.
  #
  # This is a new more elegant way to negate a whole set of 
  # tests (tokens) where the negation logic is stored on the 
  # stack, not in the current tape cell. We just add "not" to 
  # the stack token.

  # eg: ![:alpha:] ![a-z] ![abcd] !"abc" !B"abc" !E"xyz"
  #  This format is used to indicate a negative test for 
  #  a brace block. eg: ![aeiou] { add "< not a vowel"; print; clear; }

  "!*test*" {
    clear; add "!("; ++; get; 
    # check for double negative, which is an error ?
    # B"!(!(" { }
    --; add ")"; put; clear;
    add "test*"; push; .reparse
  }

  # transform quotes and classses to tests, this greatly reduces the number
  # of rules required for other reductions
  ",*quote*",".*quote*","!*quote*" { 
    # but need to stop string interp
    push; clear; add "$self->{'work'} eq '"; get; add "'"; put;
    clear; add "test*"; push; .reparse
  }

  # transform quotes to tests
  "quote*,*","quote*.*","quote*{*" { 
    replace "quote*" "test*"; push; push; 
    --; --; add "$self->{'work'} eq '"; get; add "'"; put;
    ++; ++; clear; .reparse
  }

  # transform classes to tests
  ",*class*",".*class*" { 
    push; clear; add "$self->{'work'} =~ /"; get; add "/"; put;
    clear; add "test*"; push; .reparse
  }

  # transform classes to tests
  "class*,*","class*.*","class*{*" { 
    replace "class*" "test*"; push; push; 
    --; --; add "$self->{'work'} =~ /"; get; add "/"; put;
    ++; ++; clear; .reparse
  }


  #-----------------------------------------
  # format: E"text" or E'text'
  #  This format is used to indicate a "workspace-ends-with" text before
  #  a brace block.
  "E*quote*" {
     clear; add "test*"; push; get; 
     # quotes don't have ".." in this perl translator
     "" {
       # empty argument is an error
       clear;
       add "pep script error near line "; lines;
       add " (character "; chars; add "): \n";
       add '  empty argument for end-test (E"") \n';
       print; quit;
     }
     clear; add "$self->{'work'} =~ /"; get; add "$/"; 
     --; put; ++; clear; .reparse
  } 

  #-----------------------------------------
  # format: B"sometext" or B'sometext' 
  #   A 'B' preceding some quoted text is used to indicate a 
  #   'workspace-begins-with' test, before a brace block.
  "B*quote*" {
     clear; add "test*"; push; get; 
     "" {
       # empty argument is an error, since it means nothing to say
       # that a string begins with nothing.
       clear;
       add "[pep syntax error] near line "; lines;
       add ", char "; chars; add ": \n";
       add '  empty argument for begin-test (B"") \n';
       print; quit;
     }
     # how to quote regex meta chars?
     clear; get; escape '$'; put;
     clear; add "$self->{'work'} =~ /^"; get; add "/"; 
     --; put; ++; clear; .reparse
  } 

  #--------------------------------------------
  # ebnf: command := word, ';' ;
  # formats: "pop; push; clear; print; " etc
  # all commands need to end with a semi-colon except for 
  # .reparse and .restart
  #
  "word*;*" {
     clear;
     # check if command requires parameter
     get;
     "add","while","whilenot","mark","go",
     "escape","unescape","delim","replace" {
       put; clear; add "'"; get; add "'";
       add " << command needs an argument, on line "; lines; 
       add " of script.\n";
       print; clear; quit;
     }

     # the new until tape command
     "until" { 
       clear; 
       add "$self->until(${$self->{'tape')}[$self->{'cell'}]);  # until (tape)";
       put; 
     }
     # the new go; (to mark named in tapecell) command
     # not implemented in pars/compile.pss (feb 2025)
     "go" { 
       clear; 
       add "goToMark(${$self->{'tape')}[$self->{'cell'}]);  # go (tape)";
       put; 
     }
     "clip" { 
       clear; 
       add "chop($self->{'work'});       # clip";
       put; 
     }
     # check indices
     "clop" { 
       clear; 
       add "$self->{'work'} =~ s/^.//s;       # clop \n";
       put; 
     }
     "clear" { clear; add "$self->{'work'} = '';        # clear"; put; }
     "upper" { clear; add "$self->{'work'} = uc($self->{'work'});  # upper"; put; }
     "lower" { clear; add "$self->{'work'} = lc($self->{'work'});  # lower"; put; }
     "cap" { clear;   add "$self->{'work'} .capitalize! # capital"; put; }
     "print" { 
       clear; 
       add '$self->{"output"}->print($self->{"work"});   # print'; 
       put;
     }
     "state" { clear; add '$self->printState();         # state'; put; }
     "pop" { clear; add "$self->popToken();"; put; }
     "push" { clear; add "$self->pushToken();"; put; }
     "unstack" { 
       clear; add "while ($self->popToken()) { next; }  # unstack "; put; }
     "stack" { 
       clear; add "while ($self->pushToken()) { next; } # stack "; put; }
     "put" { 
       clear; 
       add "${$self->{'tape'}}[$self->{'cell'}] = $self->{'work'};   # put "; 
       put; 
     }
     "get" { 
       clear; 
       add "$self->{'work'} .= ${$self->{'tape'}}[$self->{'cell'}];  # get";
       put;
     }
     "swap" { 
       clear; 
       add "$self->{'work'}, @{$self->{'tape'}}[$self->{'cell'}] = \n";
       add "  @{$self->{'tape'}}[$self->{'cell'}], $self->{'work'};    # swap ";
       put;
     }
     "++" { clear; add "$self->increment();      # ++"; put; }
     "--" {
       clear; 
       add "if ($self->{'cell'} > 0) { $self->{'cell'} -= 1; } # --"; put;
     }
     "read" { clear;  add "$self->readChar();         # read"; put; }
     "count" { 
       clear; 
       add "$self->{'work'} .= $self->{'counter'};     # count "; 
       put; 
     }
     "a+" { clear;    add "$self->{'counter'} += 1;  # a+ "; put; }
     "a-" { clear;    add "$self->{'counter'} -= 1;  # a- "; put; }
     "zero" { clear;  add "$self->{'counter'} = 0;   # zero "; put; }
     "chars" { 
       clear; 
       add "$self->{'work'} .= $self->{'charsRead'}; # chars "; 
       put;
     }
     "lines" { 
       clear; 
       add "$self->{'work'} .= $self->{'linesRead'}; # lines "; 
       put;
     }
     "nochars" { 
       clear; 
       add "$self->{'charsRead'} = 0;    # nochars "; 
       put;
     }
     "nolines" { 
       clear; 
       add "$self->{'linesRead'} = 0;     # nolines "; 
       put;
     }
     # use a labelled loop to quit script, also an error code
     "quit" { clear; add "exit();"; put; }
     # write to file name not "sav.pp"
     "write" { clear; add "File->write('sav.pp', $self->{'work'} )"; put; }
     # convert to "pass" which does nothing. 
     "nop" { clear; add "# nop: no-operation"; put; }

     clear; add "command*";
     push; .reparse
   }

  #-----------------------------------------
  # ebnf: commandset := command , command ;
  "command*command*", "commandset*command*" {
    clear;
    add "commandset*"; push;
    # format the tape attributes. Add the next command on a newline 
    --; get; add "\n"; ++; get; --; put; ++; 
    clear; .reparse
  } 

  #-------------------
  # 3 tokens
  #-------------------

  pop;

  #-----------------------------
  # some 3 token errors!!!
 
  # not a comprehensive list of 3 token errors
  "{*quote*;*","{*test*;*","{*class*;*",
  "{*andlist*;*","{*orlist*;*",
  "commandset*quote*;*", "command*quote*;*",
  "quote*quote*,*","quote*quote*.*",
  "class*class*,*","class*class*.*"
  {
    push; push; push;
    add "[Pep syntax error]\n near line "; lines;
    add ", char "; chars;
    add " (misplaced semicolon?) \n";
    print; clear; quit;
  }  

  # extra comma in tests, already caught in 2 token errors
  "andlist*.*{*","orlist*,*{*","test*,*{*","test*.*{*" {
    push; push; push;
    add "[Pep syntax error]\n near line "; lines;
    add ", char "; chars; add "\n";
    add " (trailing extra comma in test?)\n";
    print; clear; quit;
  }  

  # missing comma in tests ?
  B"quote*quote*".!"quote*quote*".!E";*" {
    put; clear;
    add "[Pep syntax error]\n near line "; lines;
    add ", char "; chars; add "\n";
    add " (trailing extra comma in test?)\n";
    add " (parse token stack was: "; get; add ")\n";
    print; clear; quit;
  }  

  # errors! mixing AND and OR concatenation without brackets
  "andlist*,*","orlist*.*" {
    # push the tokens back to make debugging easier
    push; push; 
    add "[Pep syntax error] mixing AND (.) and OR (,) concatenation in \n";
    add " in pep script near line "; lines;
    add " char "; chars; add "\n";
    add " Use brackets () to combine AND and OR logic.";
    add ' 
  For example:
     B".".!E"/".[abcd./] { print; }  # Correct!
     (B".".E"/"),E".txt" { print; }  # Correct!
     B".".!E"/",[abcd./] { print; }  # Error! \n';
    print; clear; quit;
  }

  # to simplify subsequent tests, transmogrify a single command
  # to a commandset (multiple commands).
  "{*command*}*" {
    clear; add "{*commandset*}*"; push; push; push;
    .reparse
  }

  # brackets around tests will be ignored. This also allows
  # (eof) to work etc.
  "(*test*)*" {
    clear; ++; get; --; put; clear;
    add "test*"; push; .reparse
  }

  # brackets will allow mixing AND and OR logic 
  "(*orlist*)*","(*andlist*)*" {
    clear; ++; get; --; put; clear;
    add "test*"; push; .reparse
  }

  # -------------
  # parses and compiles concatenated tests
  # eg: 'a',B'b',E'c',[def],[:space:],[g-k] { ...

  "test*,*test*","orlist*,*test*" {
    # OR logic concatenation 
    # put brackets around tests even though operator 
    # precedence should take care of it
    "test*,*test*" {
      clear; add "("; get; add ")";
    }
    "orlist*,*test*" { clear; get; } 
    add " || ("; ++; ++; get; --; --; add ")"; 
    put; clear; add "orlist*"; push; .reparse
  }

  # -------------
  # AND logic 
  # parses and compiles concatenated AND tests
  # eg: 'a',B'b',E'c',[def],[:space:],[g-k] { ...
  # negated tests can be chained with non negated tests.
  # eg: B'http' . !E'.txt' { ... }

  "test*.*test*","andlist*.*test*" {
    # AND logic concatenation 
    # add brackets 
    "test*.*test*" {
      clear; add "("; get; add ")";
    }
    "andlist*.*test*" { clear; get; } 
    add " && ("; ++; ++; get; --; --; add ")"; 
    put; clear; add "andlist*"; push; .reparse
  }

  #--------------------------------------------
  # ebnf: command := keyword , quoted-text , ";" ;
  # format: add "text";

  "word*quote*;*" {
    clear; get;
    "replace" {
       # error 
       add "< command requires 2 parameters, not 1 \n";
       add "near line "; lines;
       add " of script. \n";
       print; clear; quit;
    }

    # check for empty arguments 
    # eg "". Its probably silly to allow while/while not to
    # have a quote argument, since the same can be achieved with
    # >> while [a]; etc

    clear; ++; get; --; 
    # check that arg not empty, (but an empty quote is ok 
    # for the second arg of 'replace'
    "" {
      clear; 
      add "[Pep syntax error] near line "; lines;
      add ", char "; chars; add "\n"; 
      add "  command '"; get; 
      add '\' cannot have an empty argument ("") \n';
      print; quit;
    }
    clear; get;

    "mark" {
      clear;
      add "@{$self->{'marks'}}[$self->{'cell] = '"; ++; get; --; add "'; # mark";
      put; clear; add "command*"; push; .reparse
    }

    "go" {
      clear;
      add "goToMark('"; ++; get; --; add "');  # go";
      put; clear; add "command*"; push; .reparse
    }

    "delim" {
      clear;
      # the delimiter should be a single character, no?
      add "$self->{'delimiter'} = '"; ++; get; --; add "'; # delim ";
      put; clear; add "command*"; push; .reparse
    }

    "add" {
      clear;
      add "$self->{'work'} .= '"; ++; get; --; add "';";
      # handle multiline text
      # check this! \\n or \n
      replace "\n" '\'\n$self->{"work"} .= \'\\n';
      put; clear; add "command*"; push; .reparse
    }

    "while" {
      clear;
      add "while ($self->{'peep'} eq '"; ++; get; --;
      add "');   # while \n"; 
      add "  if ($self->{'eof'}) { break; }\n  $self->readChar();\n"; 
      add "}"; 
      put; clear; add "command*"; push; .reparse
    }

    "whilenot" {
      clear;
      add "while $self->{'peep'} != '"; ++; get; --;
      add "';   # whilenot \n"; 
      add "  if ($self->{'eof'}) { break; }\n  $self->readChar();\n}"; 
      put; clear; add "command*"; push; .reparse
    }

    "until" {
       clear; add "$self->until('"; 
       ++; get; --; 
       # handle multiline argument, have never used multiline until
       # check this?
       replace "\n" "\\n";
       add ');'; put; clear;
       add "command*"; push; .reparse
     }

    "escape" {
       clear;
       # it can be multiple chars (each will be escaped) 
       add "$self->escapeChar('"; ++; get; --; add "');";
       put; clear; add "command*"; push; .reparse
     }

    # replace \n with n for example (only 1 character)
    # or should unescape also replace \n with the newline char?
    "unescape" {
       clear; 
       # it can be multiple chars (each will be escaped) 
       add "$self->escapeChar('"; ++; get; --; add "');";
       put; clear; add "command*"; push; .reparse
     }

    # write to the given file name, not implemented in pep.c
    "write" {
       clear;
       add "File->write('"; ++; get; --; add "', $self->{'work'}); # write";
       put; clear; add "command*"; push; .reparse
     }

     # error, superfluous argument
     add ": command does not take an argument \n";
     add "near line "; lines;
     add " of script. \n";
     print; clear;
     #state
     quit;
   }

   #----------------------------------
   # format: "while [:alpha:] ;" or whilenot [a-z] ;

   "word*class*;*" {
     clear; get;

     "while" {
       clear;
       add "# while  \n";
       add "while ($self->{'peep'} =~ /^"; ++; get; --; add "$/) {\n";
       add "  if ($self->{'eof'}) { break; }\n  mm->readChar();\n}"; 
       put; clear; add "command*"; push; .reparse
     }

     "whilenot" {
       clear;
       add "# whilenot  \n";
       add "while ($self->{'peep'} =! /^"; ++; get; --; add "$/) {\n";
       add "  if ($self->{'eof'}) { break; }\n  mm->readChar();\n}"; 
       put; clear; add "command*"; push; .reparse
     }

     # error 
     clear;
     add "[Pep syntax error] near "; add "line "; lines; 
     add " Command '"; get; add "' does not take a class arg\n";
     print; clear; quit;
   }


  # arrange the parse> label loops
  # also, change .restart code before the parse label
  (eof) {
    "commandset*parse>*commandset*","command*parse>*commandset*",
    "commandset*parse>*command*","command*parse>*command*" {
      clear; 
      # indent both code blocks
      add "  "; get; replace "\n" "\n  "; 
      # change .restart code before parse> label
      replace "next # restart" "break; # restart"; 
      put; clear; 
      ++; ++; add "  "; get; replace "\n" "\n  "; put; clear; --; --;
      # add a block so that .reparse works before the parse> label.
      add "\n# lex block \n";
      add "while 1 {\n";
      get; add "\n  break;\n}\n"; ++; ++;
      add "if (restart) { restart = 0; next; }\n";
      # indent code block
      # add "  "; get; replace "\n" "\n  "; put; clear;
      # ruby doesnt support labelled loops (but swift does, and go?)
      # add "parse: \n";
      add "\n# parse block \n";
      add "while 1 {\n"; get;
      add "\n  break \n} # parse\n"; 
      --; --; put; clear;
      add "commandset*"; push; .reparse
    }
  }

  # -------------------------------
  # 4 tokens
  # -------------------------------

  pop;

  #-------------------------------------
  # bnf:     command := replace , quote , quote , ";" ;
  # example:  replace "and" "AND" ; 

  "word*quote*quote*;*" {
    clear; get;
    "replace" {
      #---------------------------
      # a command plus 2 arguments, eg replace "this" "that"
      clear; 
      # remove quotes around replace args
      # everything is done with regex in perl, no replace string
      ++; get; clip; clop; 
      escape "/"; replace '\\"' '"'; put; clear;
      ++; get; clip; clop; 
      escape "/"; replace '\\"' '"'; put; clear;
      --; --;
      # the \\Q stops special chars being interpreted.
      add "$self->{'work'} =~ s/\\Q";
      ++; get; add "/\\Q";
      ++; get; add "/;  # replace"; 
      --; --; put;
      clear; add "command*"; push; .reparse
    }

    add "Pep script error on line "; lines; 
    add " (character "; chars; add "): \n";
    add "  command does not take 2 quoted arguments. \n";
    print; quit;
  }

  #-------------------------------------
  # format: begin { #* commands *# }
  # "begin" blocks which are only executed once (they
  # will are assembled before the "start:" label. They must come before
  # all other commands.

  # "begin*{*command*}*",
  "begin*{*commandset*}*" {
     clear; 
     ++; ++; get; --; --; put; clear;
     add "beginblock*";
     push; .reparse
   }

  #-------------------------------------
  # we should not have to check for the {*command*}* pattern
  # because that has already been transformed to {*commandset*}*

  "test*{*commandset*}*",
  "andlist*{*commandset*}*",
  "orlist*{*commandset*}*" { 
     clear; 
     # indent the code for readability
     ++; ++; add "  "; get; replace "\n" "\n  "; put; --; --; 
     clear; add "if ("; get; add ") {\n";
     ++; ++; get;
     # block end required
     add "\n}"; 
     --; --; put; clear;
     add "command*";
     push;
     # always reparse/compile
     .reparse
   }

  # -------------
  # multi-token end-of-stream errors
  # not a comprehensive list of errors...
  (eof) {
    E"test*",E"orlist*",E"andlist*" {
      add "  Error near end of script at line "; lines;
      add ". Test with no brace block? \n";
      print; clear; quit;
    }

    E"quote*",E"class*",E"word*"{
      put; clear; 
      add "Error at end of pep script near line "; lines; 
      add ": missing semi-colon? \n";
      add "Parse stack: "; get; add "\n";
      print; clear; quit;
    }

    E"{*", E"}*", E";*", E",*", E".*", E"!*", E"B*", E"E*" {
      put; clear; 
      add "Error: misplaced terminal character at end of script! (line "; 
      lines; add "). \n";
      add "Parse stack: "; get; add "\n";
      print; clear; quit;
    }
  }

  # put the 4 (or less) tokens back on the stack
  push; push; push; push;

  (eof) {
    print; clear;

    add '
use strict;
use autodie;
#use warnings;

  # create the virtual machine object code and save it
  # somewhere on the tape.

  # code generated by "translate.perl.pss" a pep script
  # see http://bumble.sf.net/books/pars/tr/

package Machine;
  my $true = 1;
  my $false = 0;

  sub new {
    my $class = shift;  # 

    my $self = {
      # output =>    # A reference to the output and input streams
      # input =>     # if <stdin> and <stdout> are not used.
      input => *STDIN,
      output => *STDOUT,
      size => 300,      # how many initial elements in tape/marks array
      eof => 0,         # end of stream reached?
      charsRead => 0,   # how many chars already read
      linesRead => 1,   # how many lines already read
      escape => "\\\\",
      delimiter => "*", # push/pop delimiter (default "*")
      counter => 0,     # a counter for anything
      work => "",       # the workspace
      stack => (),      # stack for parse tokens 
      cell => 0,                # current tape cell
      tape => (),       # a list of attribute for tokens 
      marks => (),      # marked tape cells 
      peep => "" 
    };


    #$self->{"tape"} = 100;
    #$self->{"marks"} = 100;
    #@{$self->{"tape"}} = ("") x 100;
    #@{$self->{"marks"}} = ("") x 100;
    #$self->{"size"} = 100;
    # or use this, which may not duplicate the references
    # $my @arr = map { [] } 1..100;

    bless $self, $class;
    return $self;
  }

  # if (@tape < 5) check if length of tape is 4 or less
  sub setInput {
    my $self = shift;   # pointer to the machine
    my $newInput = shift; 
    print "to be implemented";
  }

  # read one character from the input stream and 
  #    update the machine.
  sub readChar {
    #$self->{"peep"} = getc(STDIN);

    my $self = shift;   # the pep/nom virtual-machine object
    if ($self->{"eof"}) { exit; }
    $self->{"charsRead"} += 1;
    # increment auto line counter
    if ($self->{"peep"} eq "\\n") { $self->{"linesRead"} += 1; }
    $self->{"work"} .= $self->{"peep"};
    $self->{"peep"} = getc($self->{"input"});
    #check
    if (eof($self->{"input"})) { $self->{"eof"} = 1; }
  }

  # test if all chars in workspace are in unicode category
  sub isInCategory {
    my $self = shift;   # pointer to the machine
    my $cat = shift; 
    #for ch in $self->{"work"}
    #  if not category(ch).start_with?(cat) then return false end
    #return True
  }

  # this walks the string and determines if the given char 
  # is already escaped or not # eg "ab\cab\\cab\c"
  # allow multiple chars for escape/unescape
  sub unescapeChar {
    my $self = shift;   # the machine
    my $chars = shift;  # list of chars to escape
    my $cc = "";
    my $result = "";
    my $isEscaped = $false;

    foreach $cc (split(//,$self->{"work"})) {
      if (($isEscaped == $false) && ($cc eq $self->{"escape"})) {
        $isEscaped = $true;
      } else { $isEscaped = $false; }
      # remove the last escape character (usually backslash)
      # this allows multiple chars for escaping
      if (($isEscaped == $true) && (index($chars, $cc) != -1)) {
        $result =~ s/.$//s; 
      }
      $result .= $cc; 
    }
    $self->{"work"} = $result;
  }

  # this needs to walk the string and determine if the given char 
  # is already escaped or not # eg "ab\cab\\cab\c"
  # allow multiple chars for escape/unescape
  sub escapeChar {
    my $self = shift;   # the machine
    my $chars = shift;
    my $cc = "";
    my $result = "";
    my $isEscaped = $false;

    foreach $cc (split(//,$self->{"work"})) {
      if (($isEscaped == $false) && ($cc eq $self->{"escape"})) {
        $isEscaped = $true;
      } else { $isEscaped = $false; }
      if (($isEscaped == $true) && (index($chars, $cc) != -1)) {
        $result .= $self->{"escape"};
      }
      $result .= $cc; 
    }
    $self->{"work"} = $result;
  }

  # a helper for the multiescape until bug
  sub countEscaped {
    my $self = shift;   # the machine
    my $suffix = shift; 
    my $count = 0;
    # no check
    my $s = $self->{"work"};
    $s =~ s/$suffix$//;
    while ($s =~ /$self->{"escape"}$/) {
      $count += 1;
      $s =~ s/$self->{"escape"}$//;
    }
    return $count;
  }

  # reads the input stream until the workspace end with text 
  sub until {
    my $self = shift;   # the machine
    my $suffix = shift; 
    # read at least one character
    if ($self->{"eof"}) { return; }
    # pass a reference to the machine hash with \% not %
    $self->readChar();
    while (1) { 
      if ($self->{"eof"}) { return; }
      # need to count the @escape chars preceding suffix
      # if odd, keep reading, if even, stop
      if ($self->{"work"} =~ /\Q$suffix$/) { 
        if ($self->countEscaped($suffix) % 2 == 0) { return; }
      }
      $self->readChar()
    }
  }  

  # this implements the ++ command incrementing the tape pointer
  # and growing the tape and marks arrays if required
  sub increment {
    my $self = shift;   # the machine
    $self->{"cell"} += 1;
    if ($self->{"cell"} >= $self->{"tape"}) { 
      # lengthen the tape and marks arrays by assigning to
      # length var
      $self->{"tape"} = $self->{"tape"} + 40;
      $self->{"marks"} = $self->{"marks"} + 40;
      $self->{"size"} = $self->{"tape"};
    }
  }

  # pop the first token from the stack into the workspace */
  sub popToken {
    my $self = shift;   # the machine, not local
    if (!$self->{"stack"}) { return 0; }
    $self->{"work"} = pop(@{$self->{"stack"}}) + $self->{"work"};
    if ($self->{"cell"} > 0) { $self->{"cell"} -= 1; }
    return 1;
  }

  # push the first token from the workspace to the stack 
  sub pushToken {
    my $self = shift;   # a pointer to the machine
    # dont increment the tape pointer on an empty push
    if ($self->{"work"} eq "") { return 0; }
    # need to get this from the delimiter.
    my $iFirst = index($self->{"work"}, $self->{"delimiter"});
    if ($iFirst == -1 ) {
      push(@{$self->{"stack"}}, $self->{"work"}); 
      $self->{"work"} = ""; return 1;
    }
    # s[i..j] means all chars from i to j
    # s[i,n] means n chars from i
    push(@{$self->{"stack"}}, substr($self->{"work"}, 0, $iFirst));
    $self->{"work"} = substr($self->{"work"}, $iFirst+1, -1);
    $self->increment();
    return 1;
  }

  sub printState {
    # print "Stack[${@self->{\"stack\"}.join(, )}] Work[#{$self->{\"work\"}}] Peep[#{$self->{\"peep\"}}]"
   # print "Acc:#{$self->{"counter"}} Esc:#{$self->{"escape"}} Delim:#{$self->{"delimiter"}} Chars:#{$self->{"charsRead"}}" +
   #      " Lines:#{$self->{"linesRead"}} Cell:#{$self->{"cell"}}"
  }

  sub goToMark {
    my $self = shift;   # a pointer to the machine
    my $mark = shift; 
    # search the marks 
    for my $ii (0..$self->{"marks"}) {
      if (@{$self->{"marks"}}[$ii] eq $mark) {
        $self->{"cell"} = $ii; 
        return;
      }
    }
    # mark was not found- fatal error!
    print("bad mark \'$mark!\'");
    exit();
  }

  # parses from a string and returns the result as a string.
  # useful for programatic use. Chopping last char??
  # does nothing if no input given.
  sub parseString {
    my $out = "";
    my $self = shift;       # the parse machine 
    my $in = shift;         # input string
    if (!defined $in) { return ""; }
    open my $input, "<", \\$in or return "[no input!]";
    open my $output, ">", \\$out or return "[no output!]";
    $self->parse($input, $output);
    return $out;
  }

  # parses from a file and writes to a string
  # useful for creating a pep/nom interpreter.
  sub parseFile {
    my $out = "";
    my $self = shift;          # the parse machine 
    my $infile = shift;         # input filename
    if (!defined $infile) { return ""; }
    open my $input, "<", $infile or return "[no input file!]";
    open my $output, ">", \\$out or return "[no output!]";
    $self->parse($input, $output);
    return $out;
  }

  # This function performs all sorts of magic and shenanigans. 
  # Creates a new method runScript() and evaluates it, 
  # thus acting as an interpreter of a scriptfile given to an
  # -f switch. This method only works when the perl translator has 
  # been run on itself with:
  #   >> pep -f translate.perl.pss translate.perl.pss > interp.pl.pl
  #   >> echo "read; print; print; clear; " > test.pss
  #   >> chmod a+x interp.pl.pl
  #   >> echo buzz | ./interp.pl.pl -f test.pss
  #   >> (output should be "bbuuzz")
  # Only those who have achieved
  # true enlightenment will understand this method.

  sub interpret {
    my $self = shift;          # the parse machine 
    my $filename = "";
    if ($ARGV[0] eq "-f") { $filename = $ARGV[1]; }
    if ($ARGV[0] =~ /^-f/) { 
      $filename = $ARGV[1]; $filename =~ s/^..//;
    }
    if ((!defined $filename) || ($filename eq "")) { return; }
    my $method = $self->parseFile($filename);
    # remove everything except the parse method and rename
    $method =~ s/^.*sub parse \\{/sub runScript \\{/s;
    # to debug
    # print $method; return;
    # add this new method to the current class via evaluation
    eval($method);
    # execute the new method, thus interpreting the script-file 
    # that was provide
    $self->runScript(*STDIN, *STDOUT);
  }

';  

    # save preamble (class) code in the current tape cell
    put; clear;

    #---------------------
    # check if the script correctly parsed (there should only
    # be one token on the stack, namely "commandset*" or "command*").
    pop; pop;

    "commandset*","command*",
    "beginblock*commandset*", "beginblock*command*" {
      "commandset*","command*" {
        clear;
        # indenting generated code  for readability.
        add "\n"; get; replace "\n" "\n  "; put; clear;
        add '\nwhile (!$self->{"eof"}) {\n'; get;
        add "\n}\n";
        replace "\n" "\n    "; put; clear;
        # restore the perl preamble (machine definition) from the tape
        ++; get; --;
        #add 'script: \n';
      }
      "beginblock*commandset*", "beginblock*command*" {
        clear; 
        # indent begin-block and main code for readability.
        add "\n"; ++; get; replace "\n" "\n  "; put; --; clear;
        add "\n"; get; ++; 
        add '\nwhile (!$self->{"eof"}) {\n'; get; add "\n}\n";
        replace "\n" "\n    "; --; put; clear;
        # get perl preamble from tape
        ++; ++; get; --; --;
        add "\n";
      }

      add '

  # This could have an input/output stream argument, or make the 
  # streams an attribute of the object.
  sub parse {
    my $self = shift;  # a machine reference
    my $in = shift;   # the input stream
    my $out = shift;  # output stream
    if (!defined $in) { $in = *STDIN; }
    if (!defined $out) { $out = *STDOUT; }
    #$self->{"output"} = *STDOUT;
    $self->{"output"} = $out;
    $self->{"input"} = $in;
    $self->{"peep"} = getc($self->{"input"});
    # the restart flag allows .restart to work before the 
    # parse label, in languages that dont have labelled loops
    my $restart = $false; \n';
    # the while loop has already been added around the 
    # script commands.
    get;

    add '
  }
  # end of Machine methods definition

  my $mm = Machine->new();

  # reads from <stdin> and writes to <stdout>
  $mm->parse();

  # --------
  # Other ways to call parse(), parseString() and parseFile()
  # Use an input file and print to stdout.
  # open IN, "<pars-book.txt" or die $!;
  # $mm->parse(*IN);

  # write results to an output file, reading from stdin
  # open OUT, ">test.txt" or die $!;
  # $mm->parse(*STDIN, *OUT);

  # parse/translate a string and return the result as string
  # my $result = $mm->parseString("new.new");
  # print $result;

  # parse/translate a file and return the result as a string
  # my $result = $mm->parseFile("test.txt");
  # print $result;

  # end of generated perl code\n';

      # put a copy of the final compilation into the tapecell
      # so it can be inspected interactively.
      put; print; clear; quit;
    }

    push; push;
    # try to explain some more errors
    unstack;
    B"parse>" {
      put; 
      clear; 
      add "[error] pep syntax error:\n";
      add "  The parse> label cannot be the 1st item \n"; 
      add "  of a script \n"; 
      print; quit;
    }
    put; clear;

    clear;
    add "After compiling with 'translate.perl.pss' (at EOF): \n ";
    add "  parse error in input script. \n ";
    print; clear; 
    unstack; put; clear;
    add "Parse stack: "; get; add "\n";
    add "   * debug script ";
    add "   >> pep -If script -i 'some input' \n ";
    add "   *  debug compilation. \n ";
    add "   >> pep -Ia asm.pp script' \n ";
    print; clear; 
    quit;

  } # not eof

  # there is an implicit .restart command here (jump start)