language and parsing
bumble.sf.net/books/pars
Pep, nom or pep/nom is a virtual machine and script language for parsing text patterns (languages). Pep/nom is designed as an alternative to tools such as Lex, Yacc, Flex, Bison or ANTLR.
This folder contains various files and folders relating to the pattern-parser virtual machine and script language ("pep" - Parsing Engine for Patterns). This is an experimental, but powerful, and I believe, original approach to parsing context-free languages (text patterns). It may seem an outlandish claim, but this tool could revolutionise the way that software is created, and patterns are recognised.
You can download a .tar.gz file of the entire system (plain c
machine and interpreter - with debugger, translation scripts -
in the tr/
and example scripts in the eg/ folder)
Current download url:
https://sourceforge.net/projects/bumble/
The README.txt contains instructions for compiling the code.
[last update totar.gz
: 13 sept 2021]
I need to put this onto a Git host.
The main documentation file about the machine and language is
/books/pars/pars-book.txt
(somewhat disorganised). An html version of that
document can be seen at /books/pars/pars-book.html
which is generated by
the "pep" script /books/pars/eg/mark.html.pss
. Also the file "pep.c"
/books/pars/object/pep.c
source code contains documentation. The executable
file /books/pars/pep also contains documentation about the machine which
can be accessed by using the help commands in interactive mode (the -I
switch). For example, the "Com" command in interactive mode, lists and
describes all machine commands.
28 aug 2022
working on the mark.latex.pss script which now supports most
syntax including images. The script is quite complex. It should
be straitforward to translate it to other targets such as
"markdown", html, man groff etc.
19 aug 2022
Made a magical interpret()
method in the perl translator which
will allow running of scripts.
Working on a simplified grammar for tr/translate.perl.pss
which
I hope to use in all the translator scripts. So far so good. Also
introducing a new expression grammar for tests eg
(B"a",B"b").E"z" { ... }This allows mixing AND and OR logic in tests. Also, a pep script that extracts all unique tokens from a script would be useful. 17 aug 2022 Looking at ANTLR example grammars, for ideas of simple languages such as "logo", "abnf", "bnf", "lambda", "tiny basic" Reforming grammars of the translators, writing good "unescape" and "escape" functions that actually walk and transform the workspace string. Converting perl translator to a parse method Need an "esc" command to change the escape char in all translators. The perl translator is almost ready to be an interpreter.
13 august 2022 Debugged the tcl translator- appears to be working well except for second generation scripts.
current tasks: finish translators, perl/c++/rust/tcl
start translators: lisp/haskell/R (maybe)
Write a new command "until" with no arguments.(done in some translators)
Make the translators use a "run" or "parse" method, which
can read and write to a variety of sources.
Make the tape in object/pep.c
dynamically allocated.
See if begin { ++; } create space for a variable. And use this
strategy for variable scope.
28 july 2022
Starting to create date-lists in eg/mark.latex.pss
to render lists
such as this one. Also, had the idea of a new test
F:file.txt:"int" { ... }This would test if the file "file.txt" contain a line starting with "int" and ending with ":" + workspace. This test would allow checking variable types and declarations. It would also allow better natural language parsing, because a list of nouns/adj/verbs etc could be stored in a simple text file and looked up. Also, variable scope could be included in the file eg int.global:x int.fn:x string.global:name string.local:name etc ,,,, Also, another test
F:name.txt: { ... }Would check the
name.txt
for a line which begins with the tape
and ends with the workspace.
21 july 2022
A lot of work on the javascript translator tr/translate.js.pss
1st gen tests are working. Working on the rust translator and
the eg/sed.tojava.pss
translator.
13 july 2022
new ideas: create a lisp parser, create a brainf*** compiler (done)
create a "commonmark" markdown translator. This should be
not too hard, using the ideas in eg/mark.latex.pss
will create a 'date list' format for mark.latex.pss and mark.html.pss
7 july 2022
Started a lisp parser eg/lisp.pss
Worked on eg/mark.latex.pss
which is now producing
reasonable pdf output (from .tex via pdflatex). Also realised
that the accumulator could be used to simplify the grammar
by counting words.
5 july 2022
Developed a sed to java script, "eg/sed.tojava.pss"
which has
progressed well. Still lacking branching commands and some other
gnu sed extensions.
30 june 2022
wrote a simple sed parser and formatter/explainer at
eg/sed.parse.pss
(commands a,i,c not parsed yet).
24 june 2022
Some work on the javascript and perl translators.
18 june 2022
Introducing an 'increment' method into the various machine classes
in the target languages. This allows the 'tape' and 'marks' arrays
to grow if required.
17 june 2022
Looking at translation scripts. Changing tape and mark arrays
to be dynamically growable in various target languages.
14 sept 2021 reviewing documentation, tidying.
9 sept 2021
Working on the pl/0 scripts. eg/plzero.pss
and eg/plzero.ruby.pss
eg/plzero.pss
now checks and formats a valid pl/0 program.
4 sept 2021
Working on the palindrome scripts eg/pal.words.pss
and
eg/palindrome.pss
. Both are working well and can be translated
to various languages (go, ruby, python, c, java)
I would like to add hyphen lists to mark.latex.pss and date
lists (such as this one)
28 aug 2021 Go translator now working well. I would like to write a translator for the Kotlin, R (the statistical language), swift rust. The script function pep.tt (in helpers.pars.txt) greatly helps debugging translation scripts.
20 aug 2021
More progress. A number of the translation scripts are now
quite bug free and can be tested with the helper function
pep.tt
15 july 2021
Continuing work. Starting many translation scripts such as
14 july 2021
working on
5 July 2021
working on the Ruby translator in
17 June 2021
Some work on the Makefile. renamed gh.c to pep.c
Made pep look for asm.pp in the current folder or else in the
folder pointed to by the "ASMPP" environment variable.
Need to add "upper" "lower" and "cap" to the translation scripts
in pars/tr/
15 June 2021
things done:
u/- implemented "nochars" "nolines" "upper" "lower" "cap" (capital case
for workspace) in machine.interp.c. nochars and nolines are already
in a number of translation scripts.
- clean up the pars folder (get rid of stray
Here are some immediate tasks to make the pep engine more complete.
u/- write a "make configure" script to install pep somewhere
- fix up the website at www.peptool.org and include some docs there
- try to write an html translator for the commonmark spec and contact
jgm - the pandoc guy to try to generate some interest in pep.
- write some code on rosettacode site. (done) Send to linguists.
- write a go translator for a modern compiled script engine. (done)
- finish tcl translator
8 June 2021
Have made some more good progress over the last few days. Modified
the script
Fixed
The script
18 April 2021
Having another look at this system. I still see enormous potential
for the system, but dont know how to attract anyones attention!
I updated the
I became distracted by a bootable x86 forth stack-machine system
I was coding at
I think the best idea would be to edit the
15 December 2020
I have not done any work on this project since about August 2020 but the
idea remains interesting. Finishing the "translate.c.pss" script would be
good (done: sept 2021), make "translate.go.pss" for a more modern audience
(done: sept 2021).
27 august 2020
Working on the script "translate.c.pss" to create c code from a
pep script. I may try to eliminate dependency files and include
all the required structures and functions in the script. That
should facilitate converting the output to wide chars "wchar".
11 august 2020
Ideas: write a bash script to test each script
translator (such as translate.tcl.pss translate.java.pss ....)
[done: the pep.tt function]
In the java translator, make the parse/compile script a
method of the class, with the input stream as a parameter.
So that the same method can be used to parse/compile a string,
a file, or stdin, among other things.
This technique can be used for any language but is easier with
languages that support data-structures/classes/objects.
7 august 2020
Continuing to work on the scripts translate.py.pss and
translate.tcl.pss. Had the idea to split the pars-book.txt
into separate manpages just like the tcl system
"man 3tcl string" etc.
24 july 2020
Made great progress on the script "translate.java.pss" which
could become a template for a whole set of scripts for
translating to other languages.
23 july 2020
continuing to work on translate.java.pss
Still need to convert the push/pop code and test and debug.
Many methods have been in-lined and the Machine class code
is now in the script.
22 july 2020
Rethinking the translation scripts
20 july 2020
Wrote the script
3 july 2020
Working on the script
15 june 2020
Cleaning up the files in the /books/
14 june 2020
I will rename the tool and executable to "pep" which would stand for "parsing
engine for patterns". I think it is a better name than "pp" and only seems to
conflict with "python enhancement process" in the unix/linux world.
Wrote a substantial part of the script
I will try to improve the mark.html.pss "markdown" transform
script. I would still like to promote this parsing VM since
I think it is a good and original idea.
23 august 2019
Did some work on mark.html.pss
20 august 2019
Cleaned up memory leaks (with valgrind). Also some
one-off errors and invalid read/writes. The double-free segmentation
fault seems to be fixed. Still need to fix a couple of memory bugs
in
17 august 2019
Trying to clean up the
Posted on comp.compilers and comp.lang.c to see if anyone might
find this useful or interesting...
16 august 2019
The implementation at http://bumble.sourceforge.net/books/pars/object
has arrived at a usable beta stage (barring a segmentation fault
when running big scripts).
22 feb 2015
(approximately)
Started the current implementation in the c language. I created
a simple loop to test each new command as it was added to the
machine, and this proved a successful strategy as it motivated
me to keep going and debug as I went.
[/dates]
2009
Wrote an incomplete c version of this machine called "chomski".
2006 - 2014
Wrote incomplete versions in c++ and java. The java Machine object at
It will be interesting to see how much slower the java version is.
2005
Started to think about a tape/stack parsing machine.
I am keen to try to publish this language and idea further, because I
think that it has great potential. Here is a list of things which I will
try to do, to make the system more credible.
u/- finish translation scripts for rust/haskell/c++ ...
- add an exit code to quit;
- make "w" take a filename argument
- escape should escape all chars in string eg "escape '${}'"
- fix the escape and unescape code in machine.c
- collect some artwork/screenshots/diagrams etc to go into the 'pars-book.txt'
documentation file.
- comprehensively edit and proof-read the 'parse-book.txt' file
- using mark.latex.pss create a pdf version of the booklet.
- print and bind a limited edited of the booklet. Send it to people who may
be interested.
- work on all the translation scripts so that I
can translate scripts into many other languages (and so, support unicode)
[aug 2022: java/go/ruby/python/c/tcl/js done. perl/rust/ etc need to
be debugged]
- solve the problem of attribute grammars, how do we do type checking
and variable definition validity checks? There are a number of
possible solutions, include a string 'type' stack which works just
like the token stack (but with no accompanying tape array).
Another solution is just to use the "mark" and "here" commands to
check tape cells. But we need a test that checks if the tape
is contained in the workspace or vice-versa.
u/- added a list syntax to
There appears to be a problem in growProgram in program.c called
by machine.c Throwing a seg fault.
In the
The file
d/- peplib
which compiles the object c files into a static library
/books/pars/object/*.c
implementation of the machine and program c objects
/books/pars/eg/
some pep scripts which demonstrate uses of the language
and virtual machine.
Or it can be executed directly with
The problem of "attribute" grammars is an important one, and needs
to be solved in order to make pep a viable option for compiling
computer languages. Let us say that gender, or number are attributes
of adjectives or nouns. Also, the type of a variable or expression
is an attribute of that expression.
These attributes need to "agree" when tokens are resolved/reduced: that is
"los mujeres" is grammatically incorrect because "los" has a masculine
attribute and "mujeres" has a feminine attribute.
The solution may be a type stack with an item on the stack for
each "scope" (procedure, subprocedure etc).
No, a fake stack can be made in a tape cell.
Make the following commands:
Maybe organise better the c code: the struct Program could be removed as
a member of the struct Machine.
tr/translate.cpp.pss
and trying to debug and complete others.
tr/translate.c.pss
good progress. simple scripts translating
and compiling and running. Did not eliminate dependencies so that
scripts need to be compiled with libmachine.a in the object/
.
tr/translate.ruby.pss
Should try to make a 'brew' package with ruby for pep.
gh.c
s etc).
- fixed the add "\\" bug which was cause by a bad implementation of
until in machine.interp.c (need to count preceding escape chars)
Need to fix the same in the translation scripts
/books/pars/eg/json.check.pss
so that it recognises
all json numbers.
/books/pars/tr/translate.py.pss
so that it can translate scripts as
well as itself. Started to fix /books/pars/tr/translate.tcl.pss.
Still
have an infinite loop when .restart is translated, and this is a general
problem with the "run-once" loop technique (for languages that dont have
labelled loops or goto statements, for implementing .reparse and .restart).
The solution is a flag variable that gets set by .restart before the
parse> label (see translate.ruby.pss)
eg/mark.latex.pss
is progressing well. It transforms a
markdown-ish format (like the current doc) into LaTeX. Need to do
lists/images/tables/dates
eg/json.check.pss
script to provide helpful
error messages with line+character numbers. Also, that script
incorporates the scientific number format (crockford) in
eg/json.number.pss.
However, Crockfords grammer for scientific numbers
seems much stricter than what is often allowed by json parsers
such as the "jq" utility.
/books/osdev/os.asm
That was also interesting, and
I had the idea of somehow combining it with this. Hopefully these ideas
will come to fruition.
/books/pars/pars-book.txt
document, generate a pdf, print it out, and send it to someone
who might be interested. This parsing/compiling system is
revolutionary (I think), but nobody knows about it!!
/books/pars/tr/translate.java.pss
/books/pars/tr/translate.js.pss
These scripts can be greatly simplified. I will remove all
trivial methods from the Machine object and use the script to
emit code instead. Hopefully translate.java.pss will become a
template for other similar scripts. Also, I will include the
Machine object within the script output so that there will be no
dependency on external code.
/books/pars/eg/json.number.pss
which parses
and checks numbers in json scientific format (Eg -0.00012e+012)
This script can be included in the script eg/json.parse.pss
to
provide a reasonable complete json parser/checker.
/books/pars/eg/mark.html.pss
The script is working
reasonably well for transforming thepars-book.txt
into html.
It can be run with:
pep -f eg/mark.html.pss pars-book.txt > pars-book.html
pars/
tree.
Renaming the executable to "pep" from "pp". I think "pep" will be
the tools definitive name.
/books/pars/eg/json.parse.pss
which can parse and check the
json file format. However, the parser is incomplete because at
the moment it only accepts integer numbers. Recursive object
and array parsing is working.
interpret()
(one is in the UNTIL command).
pars-book.txt
which is the primary
documentation file for the project.
/books/pars/object.java/
got to a useful stage and will be a useful target
for a script, very similar to /books/pars/tr/translate.c.pss
(and will be
called "translate.java.pss" ). This script creates compilable java code
using the java Machine object. In fact, we will be able to run this script
on itself (!). In other words we can run:
pep -f tr/translate.java.pss tr/translate.java.pss
The output will be compilable java code that can compile any parse machine
script into compilable java code. Having this java system we are
able to use unicode characters in scripts.
Roadmap
Tasks
Things done
/books/pars/eg/mark.html.pss
and mark.latex.pss
- Also a definition list
syntax. So a paragraph with lines starting with 'o-' or d/- u/-
- wrote a bash script that extracts all unique tokens from script.
- got a domain name such as peptool.org, (and peplang.org
or pepnomlang.org or nomlang.org)
- converted mark.html.pss into a version that generates LaTeX
(eg/mark.latex.pss)
Bugs
Compiling the code
object/
there is a Makefile which can be used to
compile the c interpreter code.
/books/pars/helpers.pars.sh
contains bash functions to compile
the c source code. The most important bash functions are
/books/pars/object/libmachine.a
(for linking into executable
compiled scripts)
- ppco
compiles all c source files into the executable "pp"
- ppcl
compiles standalone executable scripts generated by
compilable.c.pss
- ppjjff
compile or translate a script into java which can be run
with the code in object.java
- pep.tt
allows comprehensive testing of the translation scripts
- etc etc
Important files
/books/pars/object/pep.c
implementation of the interactive script interpreter and debugger
This version uses only plain 8 bit characters (char). However
this problem can be overcome by using a translation script
such as translate.java.pss into a language which supports
unicode.
/books/pars/compile.pss
implementation (compiler) of the script language in the
script language itself. This was originally "bootstrapped"
by ar.compile/asm.handcode.pp
/books/pars/asm.pp
implementation of the script language in "assembler" format
This is now generated from from the compile.pss script by running
pep -f compile.pss compile.pss > asm.new.pp; cp asm.new.pp asm.pp
The original "bootstrap" script compiler can be seen at
/books/pars/ar.compile/asm.handcode.pp
/books/pars/helpers.pars.sh
various bash functions to run and compile the c code and
scripts.
/books/pars/tr/translate.java.pss
a script which generates compilable java code for any script
(including itself). This script shows great potential
but needs to be more completely debugged (as of 25 july 2020)
/books/pars/eg/exp.tolisp.pss
A script which converts arithmetic expressions into a lisp-like format
/books/pars/eg/natural.language.pss
a very simple and limited natural language (english)
recogniser.
/books/pars/eg/mark.html.pss
Converts a "mark-down"-like text document format into html
This is used to generate the file "pars-book.html"
/books/pars/eg/json.parse.pss
A script that recognisers and checks a subset of the json
format (only integer numbers recognised until I integrate
the script json.number.pss into it. This can be translated to
(for example) java and executed with
pep -f translate.java.pss eg/json.parse.pss
> Machine.java
javac Machine.java
echo "[1,2,[0,0],{'name':'bob', 'age':22}]" | java Machine
,,,,
pep -f eg/json.parse.pss -i "[1,2,[0,0],{'name':'bob', 'age':22}]"
Attribute grammars and pep
Changes that need to be made
w "name.txt"; write the workspace to the file name
W "name.txt"; append the workspace to the file "name.txt"
W; append the workspace to the file name in the tape cell.
q 4; exit with code 4