#*
# print in blue with enscript
~color{0 0 1}

## maths.parse.pss

ABOUT 

 Parsing arithmetic and logical expressions and equations with numbers
 (decimals) and variables and the operators "+,-,/,*,^" and comparison
 operators <= >= == != <> 

 This parser handles assignement with := , logical comparison and 
 concatenation with && || AND and OR as well as numerical comparison
 with < <= etc It also contains function names like sqrt(...). 
 The script uses negative token lookahead, so that this script can 
 more-or-less be cut and paste into a language parser/compiler to 
 handle the parsing of expressions (although some modifications may
 be necessary, depending on the syntax of the language).

 This is a significant development and rewrite of the script eg/exp.tolisp.pss
 where I parse the arithmetic and logical infix expression. I will try to add
 functions like sqrt() for square-root and I will try to make the 'lookahead'
 code better so that it can be used in any language expression parser.

 The expression parser also includes its own error handling and help system
 where each topic has a help text. These help texts can be combined 
 with categories. (see the maths.help* token)

NOTES

 This script is now a template for how to write scripts. It has 
 error token throwing with a built in help system.

 I add brackets around each expression to show in what order
 things are being parsed.

 numbers and variable names are resolving nicely to expressions
 even before new tokens because I used a negative lookahead test.

DONE

  Converted this to the "error.token" method where errors put an 
  error token on the stack and .reparse 
  see the example nomsf://eg/drawbasic.pss for an example of this.

  also throwing a "help" token, where the type 
  of help is put into the attribute. This allows the error token
  to print some relevant help about the situation.

  Made the error checking better. Think about tokens that cant 
  start a sequence like ")" because they should have already 
  reduced. 

  Put a line char number in the (* token attribute so that if there is 
  an extra opening brace, we know where it is.

  Made the lookahead code better. Ie use negative testing for 
  operator precedence. eg if x+y not followed by * or / or ^ then 
  reduce. See file:///doc/howto/nom.lookahead.html for an example
  of this. This also simplified the code

  Add "functions" like sqrt tan cos etc.  
  eg: sqrt(x + y)

TODO

  
STATUS 

  14 mar 2025: 
    working, better error checking. better lookahead.
    added functions.

TOKENS 

  literal tokens: ( ) . ; =
  expr* an arithmetic expression
  equation* an expression terminated with ';' 
  natnum* a natural number [0-9]
  decnum* a positive decimal number
  op.and* for logic '&&' or 'AND'
  op.or* for logic '||' or 'OR'
  op.compare < <= > >= == != <> comparison operators
  op.power* '^'
  op.mul* either * or / but what should be the precedence here?
  op.add* either +/-
  variable* any variable name
  function* a function name like 'sqrt' or 'squareroot'
            functions are written sqrt(...)
  sign* either +/- but only preceeding an expression

  #lits: ().;
  # expr* natnum* decnum* op.compare* op.power* op.mul* op.add* sign* 
  # variable* function*

NOTES
  
  When implementing lookahead with pep/nom it is better to use 
  a "negative" approach, i.e. say "if this sequence is NOT followed
  by token x (which has higher precedence) then reduce, otherwise
  dont.

  
TESTING 

  >> pep -f eg/maths.parse.pss -i "-.21*(0.33+e)"

  * demonstrate operator precedence. 
  >> pep -f eg/maths.parse.pss -i "a+b*c*(d+e)"
  output:  (+ a (* (* b c) (+ d e))) 

  * test reformating of an arithmetic expression
  >> pep -f eg/maths.parse.pss -i "6 + (6+7*8)*var"

  * make a stand-alone executable of this expression parser/translator
  --------
    pep -f ../tr/translate.c.pss maths.parse.pss > maths.parse.c
    gcc -o maths.parse.ex -Lbooks/pars/object -lmachine -Ibooks/gh/object
    # not unicode aware
  ,,,

  * run the stand-alone executable with input from "stdin"
  >> echo "a+b*c*(d+e)" | ./maths.parse.ex

NOTES

  Negative variables I think are not implemented but this would
  be trivial. 4x 10size (implicit multiplication of variable)
  is also not implemented. Space is ignored so all expressions and 
  equations can span multiple lines.

EXAMPLES 

 assignments, logic and/or, comparisons, functions, variables.

 eg: (-4+3)*6.1/-0.77777
 eg: (4+x^2 == y^3) AND (x != y);
 eg: x+1 > sqrt(1*2*3);
 eg: (9*count^4) + var * 123.456
 eg: (var + 88) + -44.33 * 2 
 eg: x := (x^2 && (y^2 != 100));

HISTORY
 
 14 mar 2025
   added logic operators and parse tokens op.and* op.or* and 
   comparison operators with op.compare* token. also ';' equation*
   (which is expression terminated with ';')

 13 mar 2025
   converted from eg/exp.tolisp.pss

*#

  # The script lexing phase

  read;

  # spaces are actually significant because they cannot occur in
  # decimal numbers, eg: '4. 03' is probably an error
  # make character counter relative to each line for more helpful
  # error messages
  [\n] { nochars; } 
  [:space:] { 
    clear;
    !(eof) { .restart }
    (eof) { .reparse }
  }

  # ; is end of an expression/equation this is needed to reduce
  #   comparison operators.
  [;] { put; add "*"; push; .reparse }
  
  [<>:!=|&] { 
    while [<>:!=|&]; put; 
    # this is use for 'assignment'
    ":=" { clear; add "=*"; push; .reparse }
    "&&" { clear; add "AND"; put; clear; add "op.and*"; push; .reparse }
    "||" { clear; add "OR"; put; clear; add "op.or*"; push; .reparse }

    "<","<=",">",">=","==","!=","<>" {
      # synonyms
      "<>" { clear; add "!="; }
      put; clear; add "op.compare*"; 
      push; .reparse
    }
    clear; add "operators"; swap; ++; put; --;
    clear; add "maths.help*"; push;
    add "  unknown operator '"; get; add "'\n"; 
    put; clear; add "maths.error*"; push; .reparse
  }

  # plus and minus
  "+","-" { put; clear; add "op.add*"; push;  }
  # multiply and divide
  "*","/" { put; clear; add "op.mul*"; push;  }
  # to the power of, but what is the lisp power operator ???
  "^" { clear; add "^"; put; clear; add "op.power*"; push;  }
  # brackets for grouping
  "(" { 
    # it useful to have a line and char for extra
    # opening brackets.
    clear; add " near line:"; lines; add " char:"; chars; put; 
    clear; add "(*"; push; 
  }
  ")" { put; add "*"; push; }
  # for decimal number
  "." { put; clear; add "dot*"; push; }
  [0-9] {
    while [0-9]; put; 
    clear; add "natnum*"; push;
  }
  # What characters are variable names ?
  [:alpha:] {
    while [:alpha:]; put; 
    "AND" { clear; add "op.and*"; push; .reparse }
    "OR" { clear; add "op.or*"; push; .reparse }
    "sqrt","squareroot","cuberoot","4throot","5throot",
    "tan","cos","sin" {
      clear; add "function*"; push; .reparse
    }
    clear; add "variable*"; push;
  }

  # a trick to catch bad characters. 
  # better would be a !"text" test
  !"" { 
    put; 
    clear; add "characters"; swap; ++; put; --;
    clear; add "maths.help*"; push;
    add "  strange character '"; get; add "' encountered."; put; 
    clear; add "maths.error*"; push; .reparse
  }

parse>
  # The parse phase 

  # watch the stack at is parses: very helpful for debugging.
  # Comment out when the script works.fk
  add "* line "; lines; add " char "; chars; add ": "; print; clear; 
  unstack; print; stack; add "\n"; print; clear;


  #---------------
  # error analysis
  # we can group all error analysis here to make the script more 
  # organised. This section helps to provide good error messages to
  # the user.

  pop;
  
  # the error token which traps all errors pushed onto the stack 
  "maths.error*" {
    # get the parse stack here as well
    clear; 
    add "! arithmetic expression syntax:";
    add " near line:"; lines; add " char:"; chars; add "\n";
    get; add "\n"; 
    # print;
    # add "  The parse stack:";
    # print; --; print; clear;
    print;
    # provide help from the maths.help* token if one was 
    # put on the stack. 
    clear; pop; "maths.help*" { push; .reparse } 
    quit;
  }

  # using a parse help token to display help
  "maths.help*" {
    # the topic or category to display help for is in the
    # attribute
    clear; swap; 
    # 'brackets' is topic, 'all' is a categories

    "functions","all" {
      swap; add "
      functions are predefined words applied to an expression. 
      like 'tan' 'cos' 'sin' 'sqrt' 'cuberoot' '4throot'
      eg:
         sqrt (3^4+1)    correct
         sqrt 5          incorrect ";
    }

    "variables","all" {
      swap; add "
      variables are any alphabetic name that is not a function
      such as x abc longVar etc
      eg:
         var        correct
         var.var    incorrect ";
    }

    "characters","all" {
      swap; add "
      legal characters for expressions 
        arithmetic: ^+-*/
        comparison: < <= > >= == != <>
        logic: && || AND OR
        grouping: ()
        numbers: [-+0-9.]
        equations: := ;  (assignment and termination)
        names: any alphabetic character (for functions and variables)
        whitespace: is ignored.
      eg:
         3*size+2^0.5;    correct
         time := (sin(x) > sin(y));    correct
         4#+3            incorrect (# is not a recognised operator) ";
    }

    "operators.summary","operators","all" {
      swap; add "
      Recognised operators are:
        comparison: <= < >= > == != <>
        arithmetic: + - * / ^
        logic: && || AND OR
        grouping: ( ) ";
    }

    "assignment.operator","operators","all" {
      swap; add "
      The assignment operator is ':=' and the left-hand-side
      must be a simple variable name. Assignments must end in a 
      semicolon.
      eg:
         x := y^2+z^2;         correct";
    }

    "logic.operators","operators","all" {
      swap; add "
      The logic operators are && || 'AND' 'OR'
      AND and OR (uppercase) are synonyms for && and ||
      eg:
         && 2*x          incorrect";
    }

    "compare.operators","operators","all" {
      swap; add "
      The comparison operators are <= < >= > == != <>
      != means not equal to and '<>' is a synonym for that
      eg:
         
         (x==4) AND (x^2 != y)   correct (expression)
         sqrt(x+y^2) == x+y*3;   correct (equation)
         == 2*x                  incorrect";
    }

    "operators","all" {
      swap; add "
      Operators must be between numbers or expressions. 
      The exception to this are +/- signs which can precede
      expressions
      eg:
         4^3*(3+1)       correct
         -(4^2) + +3.1   correct
         /4+3            incorrect ";
    }

    "multiply.and.divide","operators","all" {
      swap; add "
      Multiplication is indicated by the asterisk '*' and division
      by the forward-slash '/' character. These characters must be 
      placed between 2 'expressions' (any combination of operators,
      variables and functions). The mathematical syntax 
      '4x+3y' is not implemented so far - i.e implicit multiplication 
      (but why not?)
      eg:
         4*3/(1.234 ^ 6)   correct
         -(3+2/)         incorrect (divide with no right expression) ";
    }

    "power.operator","all" {
      swap; add "
      The power operator is ^ and must occur between 2 numbers 
      or expressions
      eg:
         12.345^(1/2)    correct (square root)
         ^4              incorrect ";
    }

    "brackets","all" {
      swap; add "
      brackets () are used to group expressions and must be 
      balanced
      eg: 
        ((5.1+3)*10.0)*6^2      correct 
        (5.1+3)*10.0)*6^2       incorrect (extra close bracket) ";
    }

    "dots","all" {
      swap; add "
        The dot (.) is only used in positive and negative decimal
        numbers. I believe that decimals with no leading 0 are not permitted
        I may change this.
        egs: 5.13 -0.1234 +456.78 0.05";
    }
    "power","all" {
      swap; add "
        The power operator is '^' and has greater precedence
        than all other operators.
        eg: 5/6^2+2 will parse as ((5/(6^2))+2) ";
    }
    add "\n\n"; print; quit; 
  }
  # ----------------
  # 2 token errors
  pop;

  # look for tokens that can't start a sequence
  B")*".!")*" {
    clear; add "brackets"; put;
    clear; add "maths.help*"; push;
    add "  extra close bracket ) in expression?"; put; 
    clear; add "maths.error*"; push; .reparse
  }

  # literal token errors

  B"dot*".!E"natnum*" {
    clear; add "dots"; put;
    clear; add "maths.help*"; push;
    add "  Misplaced dot or incorrect decimal number ? ";
    put; clear; add "maths.error*"; push; .reparse
  }

  E"dot*".!B"natnum*" {
    clear; add "dots"; put;
    clear; add "maths.help*"; push;
    add "  Misplaced dot (eg .123 is not implemented) ? ";
    put; clear; add "maths.error*"; push; .reparse

  }

  # the not equals test is superfluous but here for clarity
  B"(*".!"(*" {
    E"op.mul*",E"op.power*",E")*" {
      clear; add "brackets"; put;
      clear; add "maths.help*"; push;
      add "  misplaced bracket or operator? ";
      put; clear; add "maths.error*"; push; .reparse
    }
  }

  #lits: ().
  # expr* natnum* decnum* op.compare* op.power* op.mul* op.add* sign* 
  # variable* function*

  # op.compare <= >= == != <> 
  # needs more thought
  B"op.compare*".!"op.compare" {
    E"op.mul*",E"op.power",E")*",E".*" {
      clear; add "operators"; put;
      clear; add "maths.help*"; push;
      add "  Misplaced operator ? ";
      put; clear; add "maths.error*"; push; .reparse
    }
  }


  # look for sequences
  B"op.add*".!"op.add" {
    E"op.mul*",E"op.power",E")*" {
      clear; add "operators"; put;
      clear; add "maths.help*"; push;
      add "  Misplaced operator ? ";
      put; clear; add "maths.error*"; push; .reparse
    }
  }

  # allow ** as power?
  B"op.mul*".!"op.mul*" {
    E"op.add",E"op.power",E")*" {
      clear; add "operators"; put;
      clear; add "maths.help*"; push;
      add "  Misplaced operator ? ";
      put; clear; add "maths.error*"; push; .reparse
    }
  }

  B"op.power*".!"op.power*" {
    E"op.add",E"op.power*",E"op.power",E")*" {
      clear; add "power.operator"; put;
      clear; add "maths.help*"; push;
      add "  Misplaced operator or bracket? ";
      put; clear; add "maths.error*"; push; .reparse
    }
  }

  #lits: ().
  # expr* natnum* decnum* op.compare* op.power* op.mul* op.add* sign* 
  # variable* function*

  B"natnum*".!"natnum*" {
    E"expr*",E"natnum*",E"decnum*",E"(*" {
      clear; add "  missing operator?"; put;
      put; clear; add "maths.error*"; push; .reparse
    }
  }

  B"decnum*".!"decnum*" {
    E"expr*",E"natnum*",E"decnum*",E"(*" {
      clear; add "  missing operator?"; put;
      put; clear; add "maths.error*"; push; .reparse
    }
  }

  B"expr*".!"expr*" {
    E"expr*",E"natnum*",E"decnum*",E"(*" {
      clear; add "  missing operator?"; put;
      put; clear; add "maths.error*"; push; .reparse
    }
  }

  #lits: ().
  # expr* natnum* decnum* op.power* op.mul* op.add* sign* 
  # variable function

  B"variable*".!"variable*" {
    E"variable*",E"function*",E"expr*",E"natnum*",E"decnum*",
    E"(*",E".*" {
      clear; add "variables"; put;
      clear; add "maths.help*"; push;
      clear; add "  misplaced variable name?"; put;
      put; clear; add "maths.error*"; push; .reparse
    }
  }

  B"function*".!"function*".!E"(*" {
    clear; add "functions"; put;
    clear; add "maths.help*"; push;
    clear; add "  incorrect function syntax?"; put;
    put; clear; add "maths.error*"; push; .reparse
  }

  (eof) {
    # try to diagnose missing close bracket errors at end of script
    # eg (*expr*
    # we have a line/char number in the open bracket tape cell
    B"(*" {
      clear;
      add "* missing close bracket ) ?\n";
      add "  At "; get; add " there is an opening bracket which does \n";
      add "  not seem to be matched with a closing bracket ";
      put; clear; add "maths.error*"; push; .reparse
    }

    E"dot*" {
      # provide dot help with a help parse token
      clear; add "dot"; put; 
      clear; add "maths.help*"; push;
      clear; add "  Misplaced dot '.' at end of expression. \n"; put;
      clear; add "maths.error*"; push; .reparse
    }
    E"op.add*",E"op.mul*",E"op.power*" {
      clear; add "operators";
      swap; ++; put; --; # preserve operator in tape.cell+1
      clear; add "maths.help*"; push;
      add "  Misplaced operator at '"; get; add "' end of expression.";
      put; clear; add "maths.error*"; push; .reparse
    }
  }

  push; push;

  # end of error analysis
  # ---------------

  pop; 
  
  # resolve numbers to expressions to simplify grammar rules
  "decnum*","variable*" {
     clear; get; put; clear;
     add "expr*"; push; .reparse
  }

  # reduce natnum to expression at end of file, no!!!
  (eof) {
    "natnum*" { 
       # check not preceded by dot (could be decimal)
       pop;
       !"dot*natnum*" { 
         replace "natnum*" "expr*"; push; push; .reparse 
       }
       push;
     }
  }

  #-----------------
  # 2 tokens
  pop;

  # reduce natnums to expressions if not followed by a dot
  # this creatly reduces complexity of parsing rules later
  # already ensured not "natnum*natnum*" in error checks above.
  B"natnum*".!"natnum*".!E"dot*" {
    replace "natnum*" "expr*"; push; push; .reparse
  }

  "sign*expr*" {
    clear; add "("; get; ++; get; --; add ")"; put; clear;
    add "expr*"; push; .reparse
  }
  
  # There may be an issue with reducing sign*natnum in other
  # cases
  (eof) {
    "sign*natnum*" {
      clear; add "("; get; ++; get; --; add ")"; put; clear;
      add "expr*"; push; .reparse
    }
  }
  #-----------------
  # 3 tokens
  pop;

  # These are signs at the beginning of the expression
  # only 2 tokens
  "op.add*natnum*","op.add*expr*" {
    replace "op.add*" "sign*"; push; push; .reparse
  }

  # +/- at beginning, 3 tokens
  B"sign*expr*".!"sign*expr*" {
    replace "sign*expr*" "expr*"; push; push;
    # need to compose attributes and transfer other attributes
    # compose sign/exp attribute in Lisp syntax
    --; --; add "("; get; ++; get; add ")"; --; put; clear; ++; ++;
    # transfer invisible token attribute
    get; --; put; clear; ++;  
    clear; .reparse

  }

  # convert +/- oppadd* token to a sign* token where appropriate
  # If only 2 tokens (not 3) then start of document
  E"op.add*expr*".!B"expr*" {
    replace "op.add*expr*" "sign*expr*";
    push; push; push; .reparse
  }
  E"op.add*natnum*".!B"expr*" {
    replace "op.add*natnum*" "sign*natnum*";
    push; push; push; .reparse
  }


  # we dont need any look ahead here because '^' the power operator
  # has precedence over multiply/divide/add/subtract.
  # precedence. But natnum* only becomes and expr* when it sees 
  # the following token. So need another 4 token rule
  "expr*op.power*expr*" {
    clear; 
    add "("; get; ++; get; ++; get; add ")"; 
    --; --; put; clear; 
    add "expr*"; push;
    .reparse
  }
  
  # parse positive decimal numbers like 02.345
  "natnum*dot*natnum*" {
    clear; get; ++; get; ++; get; --; --; 
    # or parse to decnum* here to make negation better
    put; clear; add "expr*"; push;
    .reparse
  }

  # an interesting trick to parse a 4 token pattern before
  # a 3 token pattern. Be aware about popping nothing (empty stack) 
  "(*expr*)*" {
     pop;
     # eg: sqrt(4+9)
     "function*(*expr*)*" {
       clear; get; add "{"; ++; ++; get; add "}"; --; --; put;
       clear; add "expr*"; push; .reparse
     }
     # only push if something was popped
     !"(*expr*)*" { push; }
     clear; add "("; ++; get; add ")"; --; put; clear;
     add "expr*"; push; .reparse
  }


  (eof) {
    # natnums have already been "reduced" to expr* at end of stream 
    "expr*op.add*expr*","expr*op.mul*expr*" {
       clear; 
       add "("; get; ++; get; ++; get; add ")"; 
       --; --; put; clear; 
       add "expr*"; push;
       .reparse
    }
  } 

  #-----------------
  # 4 tokens parse token reductions

  pop;
 
  # a better lookahead, this will work within another language
  # it only looks for operators with more precedence.

  # op.mul (*/) and op.power (^) have more precedence that +/- 
  B"expr*op.add*expr*".!"expr*op.add*expr*" {
    !E"op.mul*".!E"op.power*" {
      replace "expr*op.add*expr*" "expr*"; push;push;
      # assemble attribs for new exp token,
      --; --; add "("; get; ++; get; ++; get; add ")"; --; --; put;
      # transfer unknown token attrib 
      clear; ++; ++; ++; get; --; --; put; clear; 
      # realign tape pointer 
      ++; .reparse
    }
  }

  # op.power (^) has more precedence that */
  B"expr*op.mul*expr*".!"expr*op.mul*expr*" {
    !E"op.power*" {
      replace "expr*op.mul*expr*" "expr*"; push;push;
      --; --; add "("; get; ++; get; ++; get; add ")"; --; --; put;
      clear; ++; ++; ++; get; --; --; put; clear; 
      # realign tape pointer 
      ++; .reparse
    }
  }

  # natnum/var/decnum only become expressions when the following
  # token is seen, this is why we need a 'lookahead' rule here. 
  B"expr*op.power*expr*".!"expr*op.power*expr*" {
    #!E"op.power*" {
      replace "expr*op.power*expr*" "expr*"; push;push;
      --; --; add "("; get; ++; get; ++; get; add ")"; --; --; put;
      clear; ++; ++; ++; get; --; --; put; clear; 
      # realign tape pointer 
      ++; .reparse
    #}
  }


  # eg: x^2 != y^3 ; 
  "expr*op.compare*expr*;*" {
    clear; get; ++; get; ++; get; add ";"; --; --; put;
    clear; add "equation*"; push; .reparse
  }

  # reduce this to an equation only at EOF to allow assignments.
  # eg: x^2 OR y^3 ; 
  "expr*op.and*expr*;*","expr*op.or*expr*;*" {
    clear; get; add " "; ++; get; add " "; ++; get; add ";"; --; --; put;
    clear; add "expr*;*"; push; push; .reparse
  }

  # this is the only assignment rule. assignments are not allowed
  # inside of other expressions (unlike c)
  # eg:  ; 
  "expr*=*expr*;*" {
    # check that it is a simple variable name. Slightly tricky because
    # variables are instantly reduced to expressions.
    clear; get; 
    ![:alpha:] {
      clear; add "assignment.operator"; put;
      clear; add "maths.help*"; push;
      add "  Left-hand-side of equation not a variable? ";
      put; clear; add "maths.error*"; push; .reparse
    }

    clear; get; add " "; ++; get; add " "; ++; get; add ";"; --; --; put;
    clear; add "equation*"; push; .reparse
  }

  #-----------------
  # 5 tokens parse token reductions
  pop;

  # example to parse comparisons, so we only reduce comparisons 
  # when surrounded by brackets or terminated with ; 
  # eg: (x^2 != y) 
  "(*expr*op.compare*expr*)*" {
    clear; add "("; ++; get; ++; get; ++; get; add ")"; 
    --; --; --; put;
    clear; add "expr*"; push; .reparse
  }

  # only reduce logic when terminated in ';' or in brackets
  # (x==4) AND (x^2 != y);
  "(*expr*op.and*expr*)*","(*expr*op.or*expr*)*" {
    clear; add "("; ++; get; add " "; ++; get; add " "; ++; get; add ")"; 
    --; --; --; put;
    clear; add "expr*"; push; .reparse
  }

  push; push; push; push; push;

  (eof) {
    pop; pop;
    "expr*;*" {
      clear; get; add ";"; put; 
      clear; add "equation*"; push; .reparse
    }
    push; push;
  }

  (eof) {
    pop; pop;
    "expr*","equation*" {
      clear;  
      # add "Yes, its an expression! \n";
      get; add "\n";
      print; clear; quit;
    }

    push; push;
    add "No, it doesn't look like a single valid arithmetic \n";
    add "'in-fix' expression. The parse stack was: ";
    print; clear; unstack; add "\n"; print;
    quit;
  }