#*
 Very simple, impractical and flawed natural language parsing.

 This script also demonstrates the sever limitations of trying
 to parse natural language with the parse-machine. Actually
 this machine was never designed to parse natural language.
 
 Despite its limitations, it will accept sentences like
  "the small dog eats fish." as a valid english sentence.

STATUS
  works in a limited way

TESTING

  * test the script with a phrase
  >> pep -f eg/natural.language.pss -i "the small dog eats the fish."

  * translate to ruby and test
  >> pep.rbf eg/natural.language.pss -i "the big dog sleeps in the house."

  This actually does
  -----
    pep -f tr/translate.ruby.pss eg/natural.language.pss \
       > eg/ruby/natural.language.rb
    echo "the big dog sleeps ..." | eg/ruby/natural.language.rb
  ,,,

PROBLEMS FOR HUMAN LANGUAGE PARSING 

ATTRIBUTES NEEDED ....

 In order to attempt to parse/translate natural human language 
 it is necessary to add either an array of "attributes" to each 
 tape-cell or else a semantic-attribute object. These attributes
 would contain information such as "plural/singular", "masculine/feminine",
 A simple attribute array would allow dealing with situations like 

   "los tres amigos van al mercado" -->
   "the three friends go to the market".
 So:
   "los"
      is parsed as token="article*", tapecell="los", 
      and attributes=plural+masculine (2 attributes)
   "amigos"
      is parsed as token="noun*", tapecell="amigos", 
      and attributes=plural+masculine (2 attributes)
   "van" 
      is parsed as token="verb*", tapecell="van"
      attributes=plural
      In this case "van" can be masculine or feminine, so that 
      attribute is not marked.

  Then, when we try to reduce "article*noun*" to "nounphrase*" 
  we see that we need to "match" each attribute. For example
    "article(los)*noun(amigos)*" can be reduced to 
    "nounphrase(los amigos)*" because
     "los" has attributes "plural+masuline", and 
     "amigos" has the same attributes. However, we cannot parse
     "las amigos" because "las" has attributes "plural+feminine".
     This means the "las amigos" is a grammatical error.

  Also, consider the "unmarked" situation:
    "ellas van al mercado" 
    we have, 
       "ellas" (attributes=plural+feminine)
       "van"   (attributes=plural)
    So, "van" has 1 less attribute, but the attribute it has (plural)
    matches the subject of the sentence, so the phrase is legal.

  Of course, the set of possible attributes varies from language to 
  language (as a simple example, some languages have a "dual" number,
  as well as singular and plural). But a simple solution may be to
  create a "superset" of all possible attributes in all possible 
  languages and then match them as required.

  This schema requires extending the parse-machine to add the array
  of attributes to each tapecell. But this simple array of attributes
  also has severe limitations because of the semantic complexity of 
  human language. 

  Even better would be an attributes "object" (with heirarchical attributes)
  attached to each tape-cell. This would go some way to dealing with 
  the interdependance of semantics and grammar in human languages. 
  
AMBIGUOUS PARSING ....

  Another complication is that words may be parsed in different ways,
  such as verb or noun (eg "access", "ache"). This means that there
  needs to be some way to order these different parsings, so that
  each different parsing can be tried in turn.

  At some stage I may try to extend the parse-machine in this way, to see how
  effectively we can parse/translate human language.
 
LOOK AHEAD ....

  There is also a significant problem with "look-ahead" although one 
  simple solution would be to require a full-stop at the end of 
  sentences.

HISTORY

  13 march 2020 

    Added some new words and sentence structures.

  2019 began script

*#
 begin { 
   add '
    An attempt at basic natural language parsing. 
    Use the following words in simple sentences: 

     articles: the, this, her, his, a, one, some, 
     preposition: up, in, at, on, with, under, to
     adjectives: simple, big, small, blue, beautiful, small,
     nouns: flower, tree, dog, house, horse, girl, fish, meat,
     verbs: runs, eats, sleeps, is, grows, digs, sings

    End the sentence with a full stop "."
      eg: the small dog eats fish.
      eg: the simple horse runs on the house .
   .\n' ;
   print; clear;
 }
 read;
 [:alpha:] {
   while [:alpha:]; 
   put; 
   "the","this","her","his","a","one","some" {
     clear; add "article*"; push;
     .reparse
   }
   "up","in","at","on","with","under","to" {
     clear; add "preposition*"; push;
     .reparse
   }
   "simple","big","small","blue","beautiful","small" {
     clear; add "adjective*"; push;
     .reparse
   }
   "flower", "tree", "dog", "house", "horse", "girl", "fish", "meat" {
     clear; add "noun*"; push;
     .reparse
   }
   "runs", "eats", "sleeps", "is", "grows","digs","sings" {
     clear; add "verb*"; push;
     .reparse
   }
   put; clear; add "<"; get; add ">";
   add " Sorry, don't understand that word! \n";
   print; clear; quit;
 }

 # use a full-stop to complete sentence
 "." { put; clear; add "dot*"; push; }
 # ignore every thing else
 clear;

parse>

# 2 tokens

 pop; pop;

 "article*noun*" {
   clear; 
   get; add " "; ++; get; --; put; clear;
   add "nounphrase*"; push;
   .reparse
 }

 "verb*preposition*" {
   clear; 
   get; add " "; ++; get; --; put; clear;
   add "verbphrase*"; push;
   .reparse
 }

# 3 tokens

 pop;

 "noun*verb*dot*","nounphrase*verb*dot*",
 "noun*verbphrase*dot*","nounphrase*verbphrase*dot*" {
   clear;
   get; add " "; ++; get; --; put; clear;
   add "sentence*"; push;
   .reparse
 }

 "article*adjective*noun*" {
   clear;
   get; add " "; ++; get; add " "; ++; get; --; --; put; clear;
   add "nounphrase*"; push;
   .reparse
 }

 # 4 tokens
 pop; 

 "nounphrase*verb*noun*dot*","noun*verb*noun*dot*",
 "nounphrase*verb*nounphrase*dot*","noun*verb*nounphrase*dot*", 
 "nounphrase*verbphrase*nounphrase*dot*","noun*verbphrase*nounphrase*dot*", 
 "nounphrase*verbphrase*noun*dot*","noun*verbphrase*noun*dot*" {
   clear;
   get; add " "; ++; get; add " "; ++; get; --; --; put; clear;
   add "sentence*"; push;
   .reparse
 }

 push; push; push; push;

 (eof) {
   pop; pop; 
   "sentence*" {
      clear; 
      add "It's an english sentence! \n("; get; add ") \n";
      add "But it may not make sense! \n";
      print; clear; quit;
   }
   "nounphrase*" {
      clear; add "its a noun-phrase! ("; get; add ") \n";
      print; clear; quit;
   }
   "verbphrase*" {
      clear; add "its a verb-phrase! ("; get; add ") \n";
      print; clear; quit;
   }
   push; push;
   add "nope, not a sentence. \n"; print; clear; 
   add "The parse stack was: \n  "; print; clear;
   unstack; add "\n"; print; quit;
 }