#*  

ABOUT 

###  toybnf 

  Creating a [bnf] style language with [nom] as the compile target.

  It would be nice to have a more natural language that *targets* [nom]
  Lets expand the script above to compile to [nom]. 

  This is compiling very simple [xbnf] to [nom] . This is the first example of
  using nom as the target of a nom script. Another strange oed://corollary
  arises: that we can use this new language to implement a recogniser for
  itself (but not a compiler because so far out new language has no compiling
  syntax, just *ebnf* rule reductions. The script below parses the same syntax
  as above but instead of just recognising the syntax, it actually creates
  executable [nom] code.

  * testing the toybnf language
  >> pep -f toybnf.pss -i 'com = word param; block = word newword;'

  * sample output of toyBNF when compiling with nom script above
  ------
    # sample input BNF rules (white-space doesnt matter):
    #   com = word param ; 
    #   block = word newword ;
    # output:
    pop;pop;
    "word*param*" {
      clear; add "com*"; push; .reparse
    }
    push;push;
    pop;pop;
    "word*newword*" {
      clear; add "block*"; push; push; push; .reparse
    }
    push;push
  ,,,,

  This is pretty cool, because we now have a toybnf-to-nom compiler
  that produces executable and translatable (to go/java/tcl/python/ruby etc)
  [nom] code. But we still need a *lexxing* syntax for our toyBNF 
  language

  The "redundant push/pop" problem has a pretty simple solution, but
  we need to make sure there is no whitespace between.

  * getting rid of redundant push/pops
  -----
   replace "push;push;push;pop;pop;pop;" "";
   replace "push;push;pop;pop;" "";
   replace "push;pop;" "";
  ,,,

  This toyBNF language may not be as efficient as hand coded [nom] because
  it does redundant "pushes" nom://push and "pops" nom://pop between
  code blocks, but it is easier to write and probably less prone to 
  errors. But to make it more than a "recogniser" we have to add 
  compiling syntax like this....

  * proposed compiling syntax for toyBNF
  ----
    a = b c {
     #0 = "<a href=".$1.">".$2."</a>" ;
    }
  ,,,,

  In the syntax above '.' is the string concatenator and $1 refers to 
  the attribute of the first token on the RHS right-hand-side of the 
  bnf grammar rule. The compiling block takes the place of the ';' in
  the syntax above.

  We dont have any sensible way to actually create the 'tokens' yet.
  (ie the lexing phase of the recogniser) but we can soon invent a 
  syntax like this
   
  * proposed syntax for creating tokens from literal values
  -----
    # syntax to ignore something
    ignore [:space:]+ ;
    # compiled to nom as 
    # [:space:] { while [:space:]; clear; }

    # literals get compiled to a nom +* token but '*' must be 
    # given a name.
    literals: '+' '-' '*' '/'
    word: [:alnum:]+ ;
    newline: '\n' ;
    # parse between two "" or print error message if last not found
    # or do something else, like just instantiate the token anyway
    quote: between '"' and '"' ,  { error "No end quote"  } ;

    # compile to nom as 
    comment: between '//' and '\n' ;
    comment: between '/*' and '*/' ;
    # have to use nom://until for multi character endings

    # I am not really sure how to compile variable length 
    # keywords to nom unless the keywords are space delimited
    # or the keywords are alphanumeric
    keyword := 'and' | 'is' | 'go' | 'stop' ;
    # eg syntax means read an alphanumeric sequence and 
    #  check for keywords, but need an 'else' block
    [:alnum:]+ 
      { keyword: 'and' | 'is' | 'go' | 'stop'; } 
    then
      # if not keyword then if starts with a|b|c then 'command' token
      { command := [^abc] ; } 
    then
      # text token matches anything else 
      { text := *** ; } 
      # if nothing matched then its an error, print a line and 
      # character number and quit.
    then
      { error 'bad text'; } 
    # or maybe
    # this is very elaborate syntax, more like a fantasy. It has
    # to conform to the capabilities of nom because it is going to 
    # get compiled into pep/nom 
  ,,,,
 
  so the lexing assignment operator is different because otherwise
  >> LHS = token '=' 

  Lex rules can only have one token on the LHS but reduction
  rules can have multiple.

  in nom very basic lexrules, lexing assignment is ':=' not '='
  -------
    pop;pop;pop;pop;
    "token*:=*char*;*" {
       clear; add "lexrule*"; push; .reparse
    }

    pop;
    "token*:=*class*+*;*" {
       clear; add "lexrule*"; push; .reparse
    }
    push;push;push;push;push;

  ,,,
    
  Here is how this will be compiled by toyBNF.pss in [nom]

  * lexxing in toyBNF
  -----
    # toyBNF syntax: word = [:alnum:]+ ;
    # the final reparse may not be necessary
    read; 
    [:alnum:] { 
      while [:alnum:]; put; clear; 
      add "word*"; push; .reparse
    }
    # toyBNF syntax: newline = '\n' ;
    '\n' { put; clear; add "newline*"; push; .reparse }
  ,,,,

  tokens: 
   LHS  left-hand-side of the bnf rule
   RHS  right-hand-side
   sequence  a sequence/list of tokens
   token     one grammar token
   literal tokens:
   '='  for grammar reduction
   ':'  for tokenisation assignment
   ';'  for statement end
   

  * a basic (toy) ebnf parser, compiling to nom.

*#

  read;
  # line-relative char numbers 
  [\n] { nochars; }

  # ignore white-space
  [:space:] { while [:space:]; clear; }
  # literal tokens ; and =
  ";","=" { add "*"; push; }

  [:alpha:] { 
    # add the default nom parse token delimiter '*'
    while [:alpha:]; add "*"; put; clear; 
    add "token*"; push; 
  }
  !"" { 
    put; clear;
    add "! [toyBNF]\n";
    add " bad character '"; get; add "'"; 
    add " at line:"; lines; add " char:"; chars; add "\n";
    add " I just can't go on... sorry, goodbye";
    print; quit;
  }

parse>
  # An important grammar debugging technique for showing
  # the parse-stack reductions.
  # lines; add " char "; chars; add ": "; print; clear; 
  # unstack; print; stack; add "\n"; print; clear;

  pop; pop;
  "token*token*","sequence*token*" {
    # count tokens to calculate "push;" later
    a+;
    clear; get; ++; get; --; put; 
    clear; add "sequence*"; push; .reparse
  }
  "token*=*","sequence*=*" {
    # later have to transform this count number into
    # push; or push;push; etc
    clear; get; a+; count; put; clear; 
    # reset the token counter for the RHS 
    zero; 
    add "LHS*"; push; .reparse
  }
  "token*;*","sequence*;*" {
    clear; get; a+; count; put;
    clear; add "RHS*"; push; .reparse
  }
  "LHS*RHS*" {
    clear; 
    # first build the new token string
    #  eg 'add "tok*tok*2"; push; push; '
    # that is we need as many pushes as there are tokens and need to
    # get rid of the trailing number

    get; 
    # not very elegant but....if you've got more than 6 tokens in a 
    # row maybe you should reconsider your grammar
    # could avoid all this with a 'stack' command that updates the 
    # tape pointer properly
    E"1" { clip; add '"; push;'; }
    E"2" { clip; add '"; push; push;'; }
    E"3" { clip; add '"; push; push; push;'; }
    E"4" { clip; add '"; push; push; push; push;'; }
    E"5" { clip; add '"; push; push; push; push; push;'; }
    E"6" { clip; add '"; push; push; push; push; push; push;'; }
    put; clear; add 'add "'; get; put;
    clear;
    
    #* 
      now need to build the rhs which becomes the nom test in format
      this is bit more tricky than the LHS. If we had "stack" it
      would be much easier
      pop;pop; "c*d*" {
      }
      push;push;
    *#
    ++; 
    get;
    # build the "pushes" separately and store in tapecell+1
    E"1" { clear; add "push;"; } 
    E"2" { clear; add "push;push;"; } 
    E"3" { clear; add "push;push;push;"; } 
    E"4" { clear; add "push;push;push;push;"; } 
    E"5" { clear; add "push;push;push;push;push;"; } 
    E"6" { clear; add "push;push;push;push;push;push;"; } 
    !E"push;" {
      clear; add "! sorry 6 token sequence limit\n";
      print; quit;
    }
    ++; put; --; 
    # easier just replace push; with pop; and start building
    # the start of the nom block
    replace "push;" "pop;";
    add '\n"'; get; clip; add '"'; put;
    clear;
    --;
    # now assemble the nom block, but the lhs and rhs
    # have already been built.
    ++; get; --; add ' {\n';
    add '  clear; '; get; add ' .reparse \n';
    add '}\n';
    # now get the prebuilt "pushes" which were saved up on tape.
    ++; ++; get; --; --;
    #print; 
    put;
    clear; add "rule*"; push; .reparse
  }
  "rule*rule*","grammar*rule*" {
    clear; get; add "\n"; ++; get; --; put;
    clear; add "grammar*"; push; .reparse
  }
  push; push;
  
  (eof) {
    pop; "rule*","grammar*" {
      clear; get; add "\n\n"; print; quit;
    }
  }