/* http://bumble.sourceforge.net/books/gh/gh.c OVERVIEW This file is an attempt to create a virtual machine which is particularly apt for parsing context free languages. The machine consists of a (text) stack, (text) workspace, a "tape" (array) structure, 1 "look-ahead" character (called the "peep"), an integer accumulator (for simple counting), and 1 flag (true or false). In general all instructions operate on the "workspace" buffer. The essential idea is a stack/tape combination which can handle the nested structures of context-free languages. It is such a simple idea, that if it does work, its hard to think why it hasnt been thought of and implemented before. The general aim of the machine is to create unix-style stream filters which can parse and transform context-free language patterns. This file also includes a testing "interpreter" with help information and interactive commands for inspecting the machine and program state. PARSING WITH THE MACHINE The general procedure for parsing is to 'put' the attribute into the tape, the clear the workspace, create a token and push that token onto the stack. Then a series of tokens are popped off the stack, compared for sequences, and 'reduced' as in a bnf (backus-naur form) grammar. At the same time the values or attributes of the tokens are got from the tape and transformed as required, and then put back into the tape or printed to stdout. tokenA*tokenB* the asterix comes at the end of the token. PARSING CHALLENGES Simple arithmetic expression parser/compiler eg: n = 4*y+(6-x)/2 actually associativity (or operator precedence) is difficult if not impossible for the parse machine. parse palindromes: see test.palindromes for a incomplete example natural language parsing: see test.natural.language.pss for a limited example toy forth parser compiler toy lisp parser/compiler ideas: a machineInfo array to explain each part of the machine... stack, peep, tape, accumulator etc The syntax for the argument indicates what type of argument it is. Eg: "abc" is text; [a-z] in brackets, is a range; [abcd] is a list; [:space:] is a character class. This is incorporated into the compile() function. WORK DONE * done: changed /restart/ syntax to .restart * done: changed parse label to just "parse:" * done: changed begintest/endtest syntax to b"text" {...} and e"text" {} this is also to make way for the multiple quote test syntax below * done: added multiple quote test syntax like this "c*palindrome*c*","c*c*c*" { #* block commands. *# } This is important because often two bnf rules have the same effect. Otherwise the block commands have to be repeated in their entirity. * done: added an -i switch for text inline input. useful for testing. Also because when we are working interactively we cannot pipe input into the program. TASKS * possibly expand the negation operator to include equals tests (not just class tests). eg: !"tree" { nop; } * make a help switch that prints out a list of machine commands and descriptions to stdout. Then include that in gh-book.txt gave some thought to how to reorganise this, now that the machine is getting to a really useful stage. Here are ideas: * reorganise the code files into: a program needs a machine but a machine does not need a program. So machine doesnt need a struct Program member variable. machine.c - just the machine structure and methods (program, etc) machine.interp.c - the interactive prog (included in gh.c at the moment) machine.prog - this will be the actual unix style stream filter, and will not include any interactive testing code. machine.methods.c - these are just the parts of execute() rewritten as functions, one for each machine command. The point of this, is that it allows us to compile scripts!! machine.execute - which just contains the execute() function, because this is not required when compiled scripts. * allow scripts to be compiled! We do this by writing a script, which will convert any script into a series of c "method" calls on the machine, and then compile that program. * write pretty.pss which will pretty print a script with indentation * write explain.pss which will explain a script by listing each command with a short description of what it does. (also will convert abbreviated command names to full name) COMPILING AND RUNNING THIS CODE tools: bash shell (or any other shell for compiling etc) some unixy OS (makes things easier), sed, date, vim, asciidoctor, enscript - for formatting and printing source code * create a printable pdf of the code in 2 columns landscape format >> enscript -o - -2r -f Courier7 gh.c | ps2pdf - os.pdf All the code is in one file, so compilation should be straight-forward I call the executable "pp" for "pattern parser". The file is called gh.c because I already had a folder called "pp" * compile code >> gcc gh.c -o pp * a bash alias to run the code >> alias pp='cd ~/sf/htdocs/books/gh/; ./pp' * a bash function to compile the source code gh.c and add a date stamp --------------------- ppc() { echo "Compiling 'gh.c' as executable 'pp'" echo "Datestamp: $(date +%d%b%Y-%I%P)" # The line below adds the compile time and date to the version cd ~/sf/htdocs/books/gh cp gh.c gh.pre.c sed -i "/v31415\"; *$/s/v31415/version.$(date +%d%b%Y-%I%P)/" gh.pre.c gcc gh.pre.c -o pp } ,,, COMPILATION OF SCRIPTS In a somewhat reflexive manner a machine program asm.pp compiles a script language into machine programs. Or more explicitly the compilation of a script happens as follows. The machine loads an assembly-code file called "asm.pp". It then uses that script to compile a given script into the equivalent assembly-code (and saves the code in a text file called something like "script.pp" Then it loads that assembly-code file and uses it to process the input. In order to achieve this I removed line numbers from assembly saving and loading, and made jumps relative and allowed labels in assembly scripts. This made hand-coding of assembly programs feasible and possible. But it is still easier to use the script language instead. SEE ALSO compile.pss A script which can compile parse-scripts into an assembler format. This does just the same job as 'asm.pp' but is more readable and compact. machine.methods.c a set of functions which correspond to each instruction of the machine. asm.pp a parse machine assembler program to compile scripts. test.commands.pss contains a set of valid syntax patterns for the parse script language and can be used as a kind of syntax reference gh-book.txt The beginnings of a booklet describing the parse machine and engine. pp -Ie "read; print; clear;" -i "otto" explore the machine interactively (view registers and step through programs and input) using the simplest possible program and input. helpers.gh.sh A set of bash functions to make compiling the code easier among other things. TO DO * write function: commandHelpHtml(...) which will display a summary of commands and help in the 'markdown' or asciidoctor format, which can then be formatted into html5, docbook, latex, or pdf. FIXED BUGS * The following is not working add '#include "machine.methods.c" \n'; * range had a <= >= bug * the "until" bug. this was cause by assigning the char * lastc before a growBuffer() call. So when realloc was called by growBuffer() there was no problem until realloc() actually assigned a new memory block (usually when the buffer size was about 950 bytes). When realloc() assigned a new memory block then suddenly char * lastc was not pointing at the actual data. I found it quite hard to track this bug down. BUGS * saveAssembledProgram will not deal with the second parameter currently (aug 2019) and so will not save the "replace" command correctly. * classtests return true for an empty workspace!! * when program grows, it often creates a segmentation fault. Also, the first instruction of the program becomes 'undefined' * some label error while loading asm.pp ".token" label not found. This was after writing parameterFromText(). Basically after the first call to parameterFromText() the scan position was not updated properly to the end of the text. So the system thought there was a second label (for the same jump instruction) which it attempted to resolve into a line number. Obviously jumps cannot have two target address for the same jump. I ignored the bug by ignoring the second parameter for any jump instruction, but this may rear its head up later. * eg: b interactively, requires argument, then 'n', segmentation fault. * segmentation fault when growing the program. * if labels have trailing space, they dont work. * all scripts need an extra space at the end, other wise asm.pp cant read the final character. HISTORY I have been working on this off and on for a number of years (since about 2003) with many stops and starts in the meantime as well as many deadends. In fact I dont even know if the idea in itself is sound. Nevertheless I have a hunch that this approach to parsing context-free languages may be very interesting. We shall see. The system may now be useful (2019) The idea of a parsing machine derived from thinking about the sed stream editor and its limitations, bnf grammars, context-free languages, human languages, the limitations of regular expressions and regular languages. At the very least this virtual machine should be able to handle nested structures which "sed" is unable to handle. So lisp-like syntax (+ (* n n) m) should be easily parsable. The coding of this version was begun around mid-2014. A number of other versions have been written in the past but none was successful or complete. 12 august 2019 Made the files in the gh/object folder the canonical source code for the machine. This means I need to make ppc etc compile with these files. Discovered a bug in classtests. An empty workspace returns true for a range test. Because eg/expression.pss to parse arithmetic expressions such as "(7 + 100) * -100". Need to arrange the grammar so that it has a "lookahead" of 1 token so that operator precedence can be handled. also thought that "/" would be a better token delimiter. Need a command to set the token delimiter character on the machine. Also need a way to give statements to a script that are only executed once, when the script starts. Perhaps the (eof) section/test should work in the same way (be a script section, rather than a state-test). Also, thought that the machine needs a "testhas" test, which would return true if the workspace currently contains the text in the current tapecell. This would allow parsing strings such as "eeee", "fffff". Also a "testtapeend" which returns true if the workspace currently ends with the text in the current tapecell. Also, maybe need a "untiltape" command which reads until the workspace ends with the text in the current tape cell. This would allow parsing sed syntax "s#...##" or "s/...//" where the delimiter character can be anything which occurs after the "s" character. 10 august 2019 trying to organise the gh.c source code into separate objects in the gh/object/ folder. 8 august 2019 Continued working on compile.ccode.pss split the class* token into charclass*, range* and list* with corresponding negated classes. 7 august 2019 worked on compile.ccode.pss 6 august 2019 would be handy to have multiline quotes.... working on compile.ccode.pss 4 august 2019 I think I finally tracked down the "until" bug, which was actually a bug in readc(). A character pointer lastc was assigned before a growBuffer() call (which calls realloc()). When realloc() assigned a new memory block the character pointer was no longer valid. 3 august 2019 Still looking at the "until" bug. Basically the problem occurs when the text read with until is greater than about 950 bytes. This is caused because <950 bytes realloc() basically did nothing, hence no problem! 30 july 2019 A useful command for calculating jumps: "+int" which will add the given integer to all integers in the workspace. This command may be necessary when certain forward jumps are not known during compilation. Maybe, it could be useful to have a very basic pattern matching syntax for tests. Similar to a filename match: eg /word*?\*\* / where ? matches any one character, * matches multiple, and \* is a literal asterix. This could be useful in error handling blocks, so as not to have to write out every single combination of tokens. However, it would not be very readable. Compile.pss appears to be working. It is more readable and maintainable than asm.pp but in the case of quoteset* it compiles not very efficient code (multiple jumps where asm.pp compiles only one). See the asm.pp file for a much better error handling idea. compile.pss 664 lines asm.pp 1485 lines Had the idea for an "expand" command in which the machine will convert an abbreviated command into its full form in the workspace. Probably not. Converting asm.pp into compile.pss which is much more compact and readable. Finished converting, but not debugged. Creating notclass* syntax in asm.pp. eg ![a-z] { nop; } realised that I can just directly translate asm.pp into a compiling script. It will be convenient to have ![class] {} syntax. We can implement this in asm.pp quite easily.eg: notclass* <- !*class* command* <- notclass*{*commandset*}* Started translating asm.pp into parse-script language. It seems quite straight forward. Also, we could write a script that compiles "recognisers", just like the 2 bnf grammar rules above eg: notclass <- ! class command <- notclass { commandset } 29 july 2019 Continued converting execute() into functions in machine.methods.c Realised that I have to modify how jumps and tests work when creating executable scripts. In fact it may be necessary to use the c "goto" instruction in order to implement ".reparse" and ".restart". file sizes: gh.c 187746 bytes pp 99432 bytes machine.methods.c 16761 bytes 28 july 2019 created some machine methods in machine.methods.c by copying code from execute(). The process seems straight forward. added an -i switch to make it easier to provide input when running interactively. (we will be able to do echo "abcd" | pp -f palindrome.pss eventually) Looking again at the test.palindrome.pss script, which doesnt quite work because of ".restart" on eof. 27 july 2019, in bogota, Colombia Wrote a palindrome detecter which seems very complicated for the simple task that it does, and also it does not actually work in all cases. I implemented "quotesets" with a few nifty tricks. quotesets allow multiple equals tests for a given block. The difficulty is that they are parsed before the braces are encountered in the stream, so it is not possible to resolve the forward jump. But there was a solution to this, best understood by looking at the source code in "asm.pp". So multiple tests for one block are possible with "quotesets" which are implemented in asm.pp and resolve into tests for blocks. They are very useful because they allow syntax like this: "noun*verb*object*", "article*verb*object*", "verb*object" { # translate here } 26 july 2019 Discovered that the "until" instruction was not growing the workspace buffer properly, leading to bugs. The same bug will apply to "while". See the bugs: section for more information. For some reason readc() is not growing the workspace properly at the right time. The bug become apparent when parsing test.commands.pss and trying to read past a large multiline comment block. eg: pp -If test.commands.pss input.txt 25 july 2019 Worked on test.commands.pss which acts like a kind of syntax check and demonstration for all commands and structures implemented in asm.pp working on the asm.pp compiler. wrote the .reparse keyword and the "parse>" parse label. Finished end- and beginstest and blocks. Implemented the "replace" machine instruction but not really debugged. Added replace to the asm.pp compiler so that it can be used in scripts as well. 24 july 2019 Writing the parameterFromText() function. This will allow parsing multiple parameters to an instruction. The tricky bit is that parameterFromText() has to return the last character scanned to that the next call to it, will start and the current scan position. Once I have multiple parameters, then I can write the "replace" command: eg replace "one" "two"; realised that I need a replace command, and this requires the use of 2 parameters. Maybe a bit of infrastructure will have to be written. An example of the use of "replace" is converting c multiline comments into bash style comments. It would be possible to parse line by line and achieve this without "replace" but it is a lot more work. 23 july 2019 various bits of tidying up. Still cant accept input from standard in for some reason (program hangs and waits for console input) 22 july 2019 Implemented the swap instruction (x) to swap current tape cell and the workspace buffer. Fixed a bug in the get command which did not allocate enough memory for the stack/workspace buffer. 20 july 2019 Its all working more or less!. We can write pp -f script.pss input.txt and the system compiles the script to assembler, loads it, and runs it against the input stream in input.txt. No doubt there are a number of bugs, but the general idea works. Made progress with "asm.pp". Class blocks seem to be working. Some nested blocks now work. Asm.pp is at a useful stage. It can now compile many scripts. Still need to work out how to implement the -f switch (segmentation fault at the moment). In theory the process is simple... load asm.pp, run it on the script file (-f), then load sav.pp (output of asm.pp) and run it on the inputstream. 19 july 2019 Bug! when program grows during loading a segmentation fault occurs. created test.commands.pss which contains simple commands which can be parsed and compiled by the asm.pp script. Also, realised that the compilation from assembler should stop with errors when an undefined instruction is found. Dealt with a great many warnings that arise when one uses "gcc -Wall" implemented command 'cc' adds the input stream character count to the workspace buffer Also made an automatic newline counter, which is incremented every time a \n character is encountered. And the 'll' command which appends the newline counter as a string onto the end of the workspace buffer. Since the main function of this parse-machine is to compile "languages" from a text source, the commands above are very useful because they allow the compilation script to give error messages when the source document is not in the correct format (with line number and possibly character count). Did some work on "asm.pp" which is the assembler file which compiles scripts. Sounds very circular but it works. Realised that after applying bnf rules, need to jump back to the "parse:" label in case other previous rules apply. 18 july 2019 Discovered a bug when running "asm.pp" in unix filter mode "Abort trap: 6" which means writing to some memory location that I should not be. Strangely, when I run the same script interactively (with "rr") it works and doesnt cause the abort. Created a "write" command, on the machine, which writes the current workspace to a file called "sav.pp". This has a parallel in sed (which also has a 'w' write command). This command should be useful when compiling scripts and then running them (since they are compiled to an intermediate "assembler" phase, and then loaded into the machine). Made some progress to use the pattern-machine as a unix-style filter program. Added some command line options with getopt(). The parser should be usable (in the future) like sed: eg cat somefile | pp -sf script.pp > result.txt or cat somefile | pp -sa script.ppa > result.txt where script.ppa is an "assembler" listing which can be loaded into the machine. 16 july 2019 Working on parsing with asm.pp. Seem to have basic commands parsing and compiling eg: add "this"; pop; push; etc Simple blocks are parsing and compiling. There are still some complications concerning the order of shift-reductions. Made execute() have a return value eg: 0: success no problems 1: end of stream reached 2: undefined instruction 3: quit/crash executed (exit script) 4: write command could not open file sav.pp for writing more work. some aesthetic fixes to make it easier to see what the machine is doing. wrote showMachineTapeProgram() to give a nice view of pretty much everything that is going on in the machine at once. Working on how to collate "attributes" in the tape array register. Made an optional parameter to printSomeTape() that escapes \n \r etc in the tape cells which makes the output less messy. 15 july 2019 A lot of progress. Starting to work on asm.pp again. Have basic shift-reduction of stack tokens working. Now to get the tape "compiling" attributes as well. The bug seems to be: that JUMP is not treated as a relative jump by execute() but is being compiled as a relative jump by instructionFromText(). So, either make, JUMPs relative or ... Made the "labelTable" (or jumpTable) a property of the program. This is a good idea. Also made the command 'jj' print out the label table. Still using "jumptable" phrase but this is not a good name for this. I should organise this file: first structure definitions. then prototype declarations, and then functions. I havent done this because it was convenient to write the function immediately after the structure def (so I could look at the properties). But if I rearrange, then it will be easier to put everything in a header file, if that is a good idea. Lots of minor modifications. made searchHelp also search the help command name, for example. Added a compileTime (milliseconds) property to the Program structure, and a compileDate (time_t). 81 instructions (which is how many instructions in asm.pp at the moment) are taking 4 milliseconds to compile. which seems pretty slow really. sizes: gh.c 138430 bytes pp 80880 bytes Trying to eliminate warnings from the gcc compiler, which are actually very handy. Also seem to have uncovered a bug where the "getJump" function was actually after where it was used (and this gh.c does not use any header files, which is very primitive). So the label jumptable code should not have been working at all... changing lots of %d to %ld for long integers. Also, on BSD unix the ansi colour escape code for "grey" appears to be black. 13 july 2019 Looking at this on an OSX macbook. The code compiles (with a number of warnings) and seems to run. The colours in this bash environment are different. 12 Dec 2018 After stepping through the "asm" program I discovered that unconditional jump targets are not being correctly encoded. This probably explains why the script was not running properly. Also I may put labels into the deassembled listings so that the listings are more readable. 19 sept 2018 revisiting. Need to create command line switches: eg -a for loading an assembler script. and -f to load a script file. Need to step through the asm.pp script and work out why stack reduction is not working... (see above for the answer). An infinite loop is occurring. Also, need to write the treemap app for iphone android, not related to this. Also, need to write a script that converts this file and book files to an asciidoctor format for publishing in html and pdf. Then send all this to someone more knowledgeable. 5 sept 2018 gh.c 133423 bytes pp 78448 bytes Would be handy to have a "run until 10 more chars read" function. This would help to debug problematic scripts. Segmentation fault probably caused by trying to "put" to non-existant tape cell (past the end). Need to check tape size before putting, and grow the tape if necessary. could try to make a palindrome parser. Getting a segmentation fault when running the asm.pp program right through. Wrote an INTERPRET mode for testing- where commands are executed on the machine but not compiled into the current program. Wrote runUntilWorkspaceIs() and adding a testing command to invoke this. This should make is easier to test particular parts of a script. found and fixed a problem with how labels are resolved, this was cause by buildJumpTable() not ignoring multiline comments. 4 sept 2018 Made multiline comments (#* ... *#) work in assembler scripts. Made the machine.delimiter character visible and used by push and pop in execute(). There is no way to set the delimiter char or the escape char in scripts 3 sept 2018 Added multiline comments to asm.pp (eg #* ... *#) as well as single line comments with #. Idea: make gh.c produce internal docs in asciidoctor format so we can publish to html5/docbook/pdf etc. working on the asm.pp script. Made "asm" command reset the machine and program and input stream. Added quoted text and comments to the asm.pp script parsing, but no stack parsing yet. Need to add multiline comments to the loadAssembledProgram() function. while and whilenot cannot use a single char: eg: whilenot "\n" doesnt work. So, write 'whilenot [\n]' instead Also should write checkInstruction() called by instructionFromText() to make sure that the instruction has the correct parameter types. Eg: add should have parameter type text delimited by quotes. Not a list [...] or a range [a-z] If the jumptable is a global variable then we can test jump calculations interactively. Although its not really necessary. Would be good to time how long the machine takes to load assembler files, and also how long it takes to parse and transform files. 2 sept 2018 wrote getJump() and made instructionFromText() lookup the label jump table and calculate the relative jump. It appears to be working. Which removes perhaps the last obstacle to actually writing the script parser. Need to make program listings "page" so I can see long listings. 1 Sept 2018 writing printJumpTable() and trying to progress. Looking at Need to add "struct label table[]" jumptable parameter to instructionFromText(), and compile(). asciidoctor. 31 aug 2018 Continued to work on buildJumpTable. Will write printJumpTable. Renamed the script assembler to "asm.pp" Made a bash function to insert a timestamp. Created an "asm" command in the test loop to load the asm.pp file into the program. Started a buildTable function for a label jump table. These label offsets could be applied by the "compile" function. 30 August 2018 "gh.c" source file is 117352 bytes. Compiled code is 72800 bytes. I could reduce this dramatically by separating the test loop from the machine code. Revisiting this after taking a long detour via a forth bytecode machine which currently boots on x86 in real mode (see http://bumble.sourceforge.net/books/osdev/os.asm ) and then trying to port it to the atmega328p architecture (ie arduino) at http://bumble.sf.net/books/arduino/os.avr.asm The immediate task seems to be to write code to create a label table for assembly listings, and then use that code to replace labels with relative jump offsets. After that, we can start to write the actual code (in asm.pp) which will parse and compile scripts. So the process is: the machine loads the script parser code (in "asm" format) from a text file. The machine uses that program to parse a given script and convert to text "asm" format. The machine then loads the new text asm script and uses it to parse and transform ("compile") an input text stream. 20 decembre 2017 Allowed assembly listings with no line numbers as default. It would be good idea to allow labels in assembly listings, eg 'here:' to make it easier to hand code assembly. So, need a label table. Look at the info arrays for the syntax... Made conditional jumps relative so that they would be easier to "hand-code" as integers (although labels are really needed). Also, need to add a loadAsm() function which is shorthand to load the script assembler. 17 decembre 2017 For some reason, the code was left in a non compilable state in 2016. I think the compile() and instructionFromText() functions could be rewritten but seem to be working at the moment. 13 dec 2017 The code is not compiling because the parameter to the "compile()" function is wrong. When we display instructions, it would be good to always indicate the data type of the parameter (eg: text, int, range etc) Modify "test" to use different parameter types, eg list, range, class. 29 september 2016 used instructionFromText() within the compile() function and changed compile to accept raw instruction text (not command + arguments) wrote scanParameter which is a usefull little function to grab an argument up to a delimiter. It works out the delimiter by looking at the first char of the argument and unescapes all special chars. Now need to change loadAssembled to use compile(). 28 sept 2016 Added a help-search / and a command help search //. Added escapeText() and escapeSpecial(), and printEscapedInstruction(). add writeInstruction() which escapes and writes an instruction to file. Added instructionFromText() and a test command which tests that function. Worked on loadAssembledProgram() to properly load formats such as "while [a-z]" and "while [abc\] \\ \r \t]" etc. All this work is moving towards having the same parse routine loading assembled scripts from text files as well as interactively in the test loop. 26 sept 2016 Discovered that swap is not implemented. 22 sept 2016 Added loadlast, savelast, runzero etc. a few convenience functions in the interpreter. One hurdle: I need to be able to write testis "\n" etc where \n indicates a newline so that we can test for non printing characters. So this needs to go into the machine as its ascii code. Also, when showing program listings, these special characters \n \t \r should be shown in a different colour to make it obvious that they are special chars... Also: loadprogram is broken at the moment.... need to deal with datatypes. 21 Sept 2016 When in interpreter mode, reading the last character should not exit, it should return to the command prompt for testing purposes. 15 August 2016 Wrote an "int read" function which reads one character from stdin and simplifies the code greatly. Still need to fix "escaping". need to make ss give better output, configurable Escaping in 'until' command seems to be working. 13 August 2016 Added a couple more interpreter commands to allow the manipulation of the program and change the ip pointer. Now it is possible to jump the ip pointer to a particular instruction. Also, looked at the loadAssembledProgram and saveAssembledProgram functions to try to rewrite them correctly. The loadAssembledProgram needs to be completely cleaned up and the logic revised. My current idea is to write a program which transforms a pp script into a text assembly format, and then use the 'loadAssembledProgram' to load that script into the machine. Wrote 'runUntilTrue' function which executes program instructions until the machine flag is set to true (by one of the test instructions, such as testis testbegins, testends... This should be useful for debugging complex machine programs. 7 Jan 2016 wrote a cursory freeMachine function with supporting functions 4 Jan 2016 tidying up the help system. had the idea of a program browser, ie browse 'prog' subfolder and load selected program into the machine. Need to write the actual script compilation code. 3 Jan 2016 Writing a compile function which compiles one instruction given command and args. changed the cells array in Tape to dynamic. Since we can use array subscripts with pointers the code hardly changes. Added the testclass test Made program.listing and tape.cells pointers with dynamic memory allocation. 1 Jan 2016 working on compiling function pointers for the character class tests with the while and testis instructions. Creating reflection arrays for class and testing. late Dec 2015 Continued work. Trying to resolve all "malloc" and "realloc" problems. Using a program with instruction listing within the machine. Each command executed interactively gets added to this. 26 Dec 2015 Saving and loading programs as assembler listings. validate program function. "until" & "pop" more or less working. "testends" working ... 19 Dec 2015 Lots of small changes. The code has been polished up to an almost useable stage. The machine can be used interactively. Several instructions still need to be implemented. Push and pop need to be written properly. Need to realloc() strings when required. The info structure will include "number of parameter fields" so that the code knows how many parameters a given instruction needs. This is useful for error checking when compiling. 16 Dec 2015 Revisiting this after a break. Got rid of function pointers, and individual instruction functions. Just have one executing function "execute()" with a big switch statement. Same with the test (interpreter) loop. A big switch statement to process user commands. Start with the 'read' command. Small test file. The disadvantage of not having individual instruction functions (eg void pop(struct Machine * mm) void push(struct Machine * mm) etc) is that we cannot implement the script compiler as a series of function calls. However the "read" instruction does have a dedicated function. 23 Feb 2015 The development strategy has been to incrementally add small bits to the machine and concurrently add test commands to the interpreter. 22 Feb 2015 Had the idea to create a separate test loop file (a command interpreter) with a separate help info array. show create showTapeHtml to print the tape in html. These functions will allow the code to document itself, more or less. Changes to make: The conditional jumps should be relative, not absolute. This will make it easier to hand write the compiler in "assembly language". Line numbers are not necessary in the assembly listings. The unconditional jump eg jump 0 can still be an absolute line number. Put a field width in the help output. Change help output colours. Make "pp" help command "ls" make p.x -> px or just "." 2006 - 2014 Attempted to write various versions of this machine, in java, perl, c++ etc, but none was completed successfully see http://bumble.sf.net/pp/cpp for an incomplete implementation in c++. But better to look at the current version, which is much much better. 2005 approximately I started to think about this parsing machine while living in Almetlla de Mar. My initial ideas were prompted by trying to write parsing scripts in sed and then reading snippets of Compilerbau by N. Wirth, thinking about compilers and grammars TERMINOLOGY * tape pointer: * stack: a text buffer that can be manipulated like a stack. * command: one of the permitable instructions for the VM (eg push pop etc) * instruction: a command & parameter (maybe) compiled into a 'program' (array) which can be executed by the Virtual Machine * */ // #include #include #include #include #include #include #include "charclass.h" #include "command.h" #include "parameter.h" #include "instruction.h" #include "colours.h" #include "tapecell.h" #include "tape.h" #include "buffer.h" /* ----------------------------------------- stuff that could go in a header file. But I cant be bother moving all my structures into a different file. function prototypes. But some prototypes must go under the relevant structure. */ //void escapeSpecial(char *, char *); /* execute a compiled instruction. Possible return values might be 0: success no problems 1: end of stream reached (tried to read eof) 2: trying to execute undefined instruction 3: quit/crash command executed (exit script) 4: could not open 'sav.pp' for writing (from write command) 5: tried to execute unimplemented command */ /* the enum, array of structures, and associated functions are to provide readable exit and error codes for functions such as execute(), run(), compile(), loadScript(), etc EXECQUIT is actually a success code, where as BADQUIT (returned by the "bail" command, is an error. */ enum ExitCode { SUCCESS=0, EXECQUIT, ENDOFSTREAM, EXECUNDEFINED, BADQUIT, READSAVERROR, WRITESAVERROR, UNIMPLEMENTED }; struct { enum ExitCode error; char * description; } exitCodes[] = { { SUCCESS, "success, no errors" }, { EXECQUIT, "quit was executed (exit script)" }, { ENDOFSTREAM, "tried to read past end of stream (eof)" }, { EXECUNDEFINED, "tried to execute undefined machine instruction" }, { BADQUIT, "program exited with an error" }, { READSAVERROR, "could not open 'sav.pp' for reading" }, { WRITESAVERROR, "could not open 'sav.pp' for writing" }, { UNIMPLEMENTED, "executed an unimplemented machine instruction" } }; // print the description for an error void printExitCode(enum ExitCode error) { printf("(%d) %s\n", exitCodes[error].error, exitCodes[error].description); } // commands to test and analyse the machine // these enumerations are in the same order as the informational // array below for convenience enum TestCommand { HELP=0, COMMANDHELP, SEARCHHELP, SEARCHCOMMAND, LISTMACHINECOMMANDS, DESCRIBEMACHINECOMMANDS, MACHINECOMMANDDOC, LISTCLASSES, LISTCOLOURS, MACHINEPROGRAM, MACHINESTATE, MACHINETAPESTATE, MACHINEMETA, BUFFERSTATE, STACKSTATE, TAPESTATE, TAPEINFO, TAPECONTEXTLESS, TAPECONTEXTMORE, RESETINPUT, RESETMACHINE, STEPMODE, PROGRAMMODE, MACHINEMODE, COMPILEMODE, IPCOMPILEMODE, ENDCOMPILEMODE, INTERPRETMODE, LISTPROGRAM, LISTSOMEPROGRAM, LISTPROGRAMWITHLABELS, PROGRAMMETA, SAVEPROGRAM, SHOWJUMPTABLE, LOADPROGRAM, LOADASM, LOADLAST, LOADSAVED, LISTSAVFILE, SAVELAST, CHECKPROGRAM, CLEARPROGRAM, CLEARLAST, INSERTINSTRUCTION, EXECUTEINSTRUCTION, PARSEINSTRUCTION, TESTWRITEINSTRUCTION, STEPCODE, RUNCODE, RUNZERO, RUNCHARSLESSTHAN, RUNTOLINE, RUNTOTRUE, RUNTOWORK, RUNTOENDSWITH, IPZERO, IPEND, IPGO, IPPLUS, IPMINUS, SHOWSTREAM, EXIT, UNKNOWN }; // stepcode and executeinstruction below seem to be the // same exactly struct { enum TestCommand c; char * names[2]; char * argText; // eg char * description; } testInfo[] = { { HELP, {"hh", ""}, "", "list all interactive commands" }, { COMMANDHELP, {"H", ""}, "", "show help for a given machine command" }, { SEARCHHELP, {"/", "h/"}, "", "searches help system for an interpreter command containing search term" }, { SEARCHCOMMAND, {"//", "//"}, "", "searches help for a machine command containing search term" }, { LISTMACHINECOMMANDS, {"com", ""}, "", "list all machine commands" }, { DESCRIBEMACHINECOMMANDS, {"Com", ""}, "", "list and describe machine commands" }, { MACHINECOMMANDDOC, {"doc", ""}, "", "output machine commands in a documentation style format." }, { LISTCLASSES, {"class", "cl"}, "", "list all valid character classes for testclass and while" }, { LISTCOLOURS, {"col", "colours"}, "", "list all ansi colours" }, { MACHINEPROGRAM, {"m", ""}, "", "show state of machine, tape and current program instructions" }, { MACHINESTATE, {"M", ""}, "", "show the state of the machine" }, { MACHINETAPESTATE, {"s", ""}, "", "show state of the machine buffers with some tape cells" }, { MACHINEMETA, {"Mm", ""}, "", "show some meta information about the machine" }, { BUFFERSTATE, {"bu", ""}, "", "show the state of the machine buffer (stack/workspace)" }, { STACKSTATE, {"S", "stack"}, "", "show the state of the machine stack" }, { TAPESTATE, {"T", "t.."}, "", "show the state of the machine tape" }, { TAPEINFO, {"TT", "tapeinfo"}, "", "show detailed info of the state of the machine tape" }, { TAPECONTEXTLESS, {"tcl", "lesstape"}, "", "reduce ammount of tape that will be displayed by printSomeTapeInfo()" }, { TAPECONTEXTMORE, {"tcm", "moretape"}, "", "increase ammount of tape to be displayed by printSomeTapeInfo()" }, { RESETINPUT, {"i.r", ""}, "", "reset the input stream" }, { RESETMACHINE, {"M.r", ""}, "", "reset the machine to original state" }, { STEPMODE, {"m.s", ""}, "", "make step through instructions" }, { PROGRAMMODE, {"m.p", ""}, "", "make display program state" }, { MACHINEMODE, {"m.m", ""}, "", "make display machine state" }, { COMPILEMODE, {"m.c", ""}, "", "compile mode: entered instructions are compiled but " "not executed" }, { IPCOMPILEMODE, {"m.ipc", ""}, "", "entered instructions are compiled at current ip position" }, { ENDCOMPILEMODE, {"m.ec", ""}, "", "entered instructions are compiled at end of program" }, { INTERPRETMODE, {"m.int", "interpret"}, "", "entered instructions are executed but not compiled" }, { LISTPROGRAM, {"ls", "p.ls"}, "", "list all instructions in the machines current program" }, { LISTSOMEPROGRAM, {"l", "list"}, "", "list current instructions in the machines program" }, { LISTPROGRAMWITHLABELS, {"pl", "p.ll"}, "", "list all instructions in program with labels (and jump labels)" }, { PROGRAMMETA, {"pm", "pi"}, "", "show some meta information about the current program" }, { SAVEPROGRAM, {"wa", "p.w"}, "", "save the current program as 'assembler'" }, { SHOWJUMPTABLE, {"jj", "showjumps"}, "", "Show the jumptable generated by buildJumpTable()" }, { LOADPROGRAM, {"l.asm", "l.a"}, "", "load machine assembler commands from text file" }, { LOADASM, {"asm", "as"}, "", "load 'asm.pp' (the script parser) and reset the machine" }, { LOADLAST, {"last", "p.ll"}, "", "load 'last.pp' (the program automatically saved on exit)" }, { LOADSAVED, {"sav", "l.sav"}, "", "load 'sav.pp' (output of the 'write' command.)" }, { LISTSAVFILE, {"lss", "ls.sav"}, "", "list the contents of 'sav.pp' (output of the 'write' command.)" }, { SAVELAST, {"ww", "p.ww"}, "", "save 'last.pp' (the program automatically saved on exit)" }, { CHECKPROGRAM, {"p.v", ""}, "", "validate or check the machines compiled program " }, { CLEARPROGRAM, {"p.dd", "dd"}, "", "delete the machines compiled program " }, { CLEARLAST, {"p.dl", "pdl"}, "", "delete the last instruction in the compiled program " }, { INSERTINSTRUCTION, {"pi", "p.i"}, "", "insert an instruction at the current program ip " }, { EXECUTEINSTRUCTION, {"n", "."}, "", "execute the next (current) compiled instruction in program" }, { PARSEINSTRUCTION, {"pi:", "pi"}, "", "parse some example text into a compiled instruction" }, { TESTWRITEINSTRUCTION, {"twi", "twi"}, "", "shows how the current instruction will be written by writeInstruction()" }, { STEPCODE, {"p.s", "ps"}, "", "step through the next instruction in compiled program" }, { RUNCODE, {"rr", "p.r"}, "", "run the whole compiled program from the current instruction" }, { RUNZERO, {"r0", "p.r0"}, "", "run the whole compiled program from instruction zero" }, { RUNCHARSLESSTHAN, {"rrc", "runchars"}, "", "run program while characters read less than " }, { RUNTOLINE, {"rrl", "p.rl"}, "", "run the compiled program until given input stream line number" }, { RUNTOTRUE, {"rrt", "p.rt"}, "", "run the compiled program until flag is set to true" }, { RUNTOWORK, {"rrw", "runwork"}, "", "run program until the workspace is exactly the given text" }, { RUNTOENDSWITH, {"rre", "runworkendswith"}, "", "run program until the workspace ends with the given text" }, { IPZERO, {"p<<", "p0"}, "", "set the instruction pointer to zero" }, { IPEND, {"p>>", "p.e"}, "", "set the instruction pointer to the end of the program" }, { IPGO, {"pg", "p.g"}, "", "set the instruction pointer to the given instruction" }, { IPPLUS, {"p>", "p.>"}, "", "increment the instruction pointer without executing" }, { IPMINUS, {"p<", "p.<"}, "", "decrement the instruction pointer without executing" }, { SHOWSTREAM, {"ss", "ss"}, "", "shows the next few characters from the input stream" }, { EXIT, {"X", "exit"}, "", "exit the maching testing program" }, { UNKNOWN, {"", ""}, "", ""} }; /* display help for one interactive help command (not a machine command. which may be should be referred to as instructions to avoid confusion) */ void printHelpCommand(int command, int comColour, int helpColour) { int ii = command; printf("%s%4s: %s%s%s ", colourInfo[comColour].ansi, testInfo[ii].names[0], colourInfo[helpColour].ansi, testInfo[ii].description, NORMAL); } /* Display help just for core help commands to assist the user to start to use the interactive system */ void printUsefulCommands() { int commands[] = { HELP, MACHINEPROGRAM, LISTMACHINECOMMANDS, EXECUTEINSTRUCTION, LISTPROGRAM, RUNCODE}; printf("\nUseful interactive commands: \n" "--------------- \n"); int nn; for (nn = 0; nn < 6; nn++) { printHelpCommand(commands[nn], YELLOWc, WHITEc); printf("\n"); } } // one item in the label jump table (to convert asm labels to // instruction/line numbers struct Label { char text[64]; // label maximum 64 characters int instructNumber; // line/instruction number equivalent for label }; // a declaration void printJumpTable(struct Label []); // the compiled program, which is a set of instructions // the instruction listing really needs to be malloced dynamically // not a static array #define PROGRAMCAPACITY 500 struct Program { // date and time when compiled time_t compileDate; // how long program took to compile (milliseconds), timed with clock() ? clock_t compileTime; // starting to execute int startExecute; // time ended executing int endExecute; // whether new compiled instructions are appended to the // program (after count), inserted after IP or overwrite from ip // ??? necessary. enum CompileMode {APPEND, INSERT, OVERWRITE} compileMode; // how much room for instructions size_t capacity; // how many times has program memory been reallocated int resizings; //how many instructions are in the program int count; //current instruction (next to be executed) int ip; // the set of program instructions (dynamically allocated) struct Instruction * listing; // if applicable, name of assembly file containing instructions char source[128]; // an array of line labels and line numbers from assembly listing struct Label labelTable[256]; // static allocation of memory for instructions // struct Instruction listing[PROGRAMCAPACITY]; }; /* a prototype declaration, just to stop a gcc compiler warning, because I am using compile() before I define it. */ int compile(struct Program *, char *, int, struct Label[]); void newProgram(struct Program * program, size_t capacity) { program->resizings = 0; program->count = 0; program->ip = 0; program->compileTime = -1; program->compileDate = -1; program->startExecute = -1; program->endExecute = -1; program->listing = malloc(capacity * sizeof(struct Instruction)); if(program->listing == NULL) { fprintf(stderr, "couldnt allocate memory for program listing newProgram()\n"); exit(EXIT_FAILURE); } program->capacity = capacity; int ii; for (ii = 0; ii < capacity; ii++) { newInstruction(&program->listing[ii], UNDEFINED); } strcpy(program->source, "?"); memset(program->labelTable, 0, sizeof(program->labelTable)); } void freeProgram(struct Program * pp) { int ii; for (ii = 0; ii < pp->capacity; ii++) { // If instruction parameters are malloced, as they // should be, then we will have to free the associated // memory. But at the moment it is static memory allocation. //freeInstruction(&program->listing[ii]); } free(pp->listing); } // increase program capacity // there is a bug in the capacity arithmetic so that // on the second realloc 2 instructions are wiped. // also segmentation fault !!! void growProgram(struct Program * program, size_t increase) { printf("Program listing is growing!!\n"); program->capacity = program->capacity + increase; program->listing = realloc(program->listing, program->capacity * sizeof(struct Instruction)); if(program->listing == NULL) { fprintf(stderr, "couldnt allocate more memory for program listing in growProgram()\n"); exit(EXIT_FAILURE); } int ii; for (ii = program->count; ii < program->capacity; ii++) { newInstruction(&program->listing[ii], UNDEFINED); } } // insert an instruction at the current ip position void insertInstruction(struct Program * program) { int ii; struct Instruction * thisInstruction; if (program->count == program->capacity) { printf("no more room in program... \n"); return; } for (ii = program->count-1; ii >= program->ip; ii--) { thisInstruction = &program->listing[ii]; if (commandType(thisInstruction->command) == JUMPS) { // bug!! actually we only increment jumps if // the jump target is after the instruction if (thisInstruction->a.number > program->ip) { thisInstruction->a.number++; } } //printInstruction(thisInstruction); printf(" \n"); copyInstruction(&program->listing[ii+1], thisInstruction); } newInstruction(&program->listing[program->ip], NOP); program->count++; } /* print meta information about the program This information is part of the Program structure, so it is not really meta information, technically. */ void printProgramMeta(struct Program * program) { char Colour[30]; char date[30]; char time[30]; strcpy(date, "?"); strcpy(time, "?"); strcpy(Colour, BROWN); printf("%s Program source:%s %s \n", Colour, NORMAL, program->source); printf("%sCapacity (Instructions):%s %ld \n", Colour, NORMAL, program->capacity); printf("%s Memory reallocation:%s %d \n", Colour, NORMAL, program->resizings); printf("%s How many instructions:%s %d \n", Colour, NORMAL, program->count); printf("%s Current instruction:%s %d instruction:", Colour, NORMAL, program->ip); printInstruction(&program->listing[program->ip]); printf("\n"); if (program->compileDate != -1) { strcpy(date, ctime(&program->compileDate)); } if (program->compileTime != -1) { sprintf(time, "%ld", program->compileTime); } printf("%s Compiled at:%s %s \n", Colour, NORMAL, date); printf("%s Compile time:%s %s milliseconds \n", Colour, NORMAL, time); } /* given a text label get the line/instruction number from the table or else return -1 */ int getJump(char * label, struct Label table[]) { int ii; for (ii = 0; ii < 1000; ii++) { // if not found if (table[ii].text[0] == 0) return -1; // return number if label found if (strcmp(table[ii].text, label) == 0) { return table[ii].instructNumber; } } return -1; } /* return label for given instruction, or else null an empty string means "not found" */ char * getLabel(int instruction, struct Label table[]) { int ii = 0; for (ii = 0; ii < 1000; ii++) { // if not found if (table[ii].text[0] == 0) break; if (table[ii].instructNumber == instruction) { return table[ii].text; } } return table[ii].text; } // print the instructions in a program void printProgram(struct Program * program) { int ii; char key; struct Instruction * thisInstruction; printf("%sFull Program Listing %s", CYAN, NORMAL); printf(" %s(%ssize:%s%d%s ip:%s%d%s cap:%s%lu%s)%s \n", GREY, GREEN, NORMAL, program->count, GREEN, NORMAL, program->ip, GREEN, NORMAL, program->capacity, GREY, NORMAL); for (ii = 0; ii < program->count; ii++) { // page the listing if ( (ii+1) % 16 == 0 ) { printf("\n%s%s for more (%sq%s = exit):", AQUA, NORMAL, AQUA, NORMAL); key = getc(stdin); if (key == 'q') { return; } // go back a page? lets not worry about it // if (key == 'b') { ii = (((ii-16)>(0))?(ii-16):(0)); } } thisInstruction = &program->listing[ii]; if (ii == program->ip) { printf("%s%3d> ", YELLOW, ii); printInstruction(thisInstruction); printf("%s\n", NORMAL); } else { printf("%s%3d:%s ", WHITE, ii, NORMAL); printInstruction(thisInstruction); printf("%s\n", NORMAL); } } } /* print all the instructions in a compiled program along with the labels which were parse from the assembly listing This may be handy for debugging jump target problems. Some jump targets may be relative, no?? */ void printProgramWithLabels(struct Program * program) { int ii; char key; char * label; struct Instruction * thisInstruction; printf("%s[Full Program Listing] %s", YELLOW, NORMAL); printf(" %s(%ssize:%s%d%s ip:%s%d%s cap:%s%lu%s)%s \n", GREY, GREEN, NORMAL, program->count, GREEN, NORMAL, program->ip, GREEN, NORMAL, program->capacity, GREY, NORMAL); /* to do better paging, we can change this to a while loop and increment ii or set ii=0 (g) ii=max (G) etc */ for (ii = 0; ii < program->count; ii++) { // page the listing if ( (ii+1) % 16 == 0 ) { printf("\n%s%s for more (%sq%s = exit) ...", AQUA, NORMAL, AQUA, NORMAL); key = getc(stdin); // make 'g' go to top, 'G' go to bottom etc if (key == 'q') { return; } // go back a page here, but how to do it? // if (key == 'b') { ii = (((ii-16)>(0))?(ii-16):(0)); } } thisInstruction = &program->listing[ii]; // check if there is a label label = getLabel(ii, program->labelTable); if (strlen(label) > 0) { printf("%s:\n", label); } if (ii == program->ip) { printf("%s%3d> ", YELLOW, ii); printInstruction(thisInstruction); printf("%s\n", NORMAL); } else { printf("%s%3d:%s ", WHITE, ii, NORMAL); printInstruction(thisInstruction); printf("%s\n", NORMAL); } } } // show the program around the ip instruction pointer void printSomeProgram(struct Program * program, int context) { int ii; struct Instruction * thisInstruction; int start = program->ip - context; int end = program->ip + context; if (start < 0) start = 0; if (end > program->capacity) end = program->capacity; //if (program->ip > end) end = program->ip + 2; printf("%sPartial Program Listing%s", CYAN, NORMAL); printf(" %s(%ssize:%s%d%s ip:%s%d%s cap:%s%lu%s)%s \n", GREY, GREEN, NORMAL, program->count, GREEN, NORMAL, program->ip, GREEN, NORMAL, program->capacity, GREY, NORMAL); for (ii = start; ii < end; ii++) { thisInstruction = &program->listing[ii]; if (ii == program->ip) { printf(" %s%d> ", YELLOW, ii); printInstruction(thisInstruction); printf("%s\n", NORMAL); } else { printf(" %s%d%s: ", CYAN, ii, NORMAL); printInstruction(thisInstruction); printf("%s\n", NORMAL); } } } // save the compiled program as assembler to a text file // we need to escape quotes in arguments !! // also need to handle other parameter datatypes such as class, // range, list etc. // Line numbers may make modifying the assembly laborious. void saveAssembledProgram(struct Program * program, FILE * file) { int ii; fprintf(file, "# Assembly listing \n"); for (ii = 0; ii < program->count; ii++) { //fprintf(file, "%d: ", ii); writeInstruction(&program->listing[ii], file); fprintf(file, "\n"); } fprintf(file, "# End of program. \n"); printf("%s%d%s instructions written to file. \n", BLUE, ii, NORMAL); } // in some cases it may be useful to have a numbered code listing void saveNumberedProgram(struct Program * program, FILE * file) { int ii; fprintf(file, "# Numbered Assembly listing \n"); for (ii = 0; ii < program->count; ii++) { fprintf(file, "%d: ", ii); writeInstruction(&program->listing[ii], file); fprintf(file, "\n"); } fprintf(file, "# End of program. \n"); printf("%s%d%s instructions written to file. \n", BLUE, ii, NORMAL); } // clear the machines compiled program void clearProgram(struct Program * program) { int ii; for (ii = 0; ii < program->count + 50; ii++) { program->listing[ii].command = UNDEFINED; program->listing[ii].a.number = 0; program->listing[ii].a.datatype = UNSET; program->listing[ii].b.number = 0; program->listing[ii].b.datatype = UNSET; } program->count = 0; program->ip = 0; program->compileTime = -1; program->compileDate = 0; } /* scans an instruction parameter and stops at the delimiter also converts escape sequences such as \n \r to their actual character. The unescaped sequence is stored starting at pointer param. return the new position in the scan stream. The datatype of the parameter is set by the instructionFromText() function */ char * scanParameter(char * param, char * text, int line) { char * nextChar; char delimiter; char * arg = text; if (*arg == 0) return arg; // lets skip leading spaces here. while(isspace((unsigned char)*arg)) arg++; if (*arg == 0) return arg; switch (* arg) { case '[': delimiter = ']'; break; case '{': delimiter = '}'; break; case '\'': delimiter = '\''; break; case '"': delimiter = '"'; break; default: return arg; break; } nextChar = arg+1; while ((*nextChar != '\0') && (*nextChar != delimiter)) { if (*nextChar == '\\') { nextChar++; // check \n \t \] \\ etc switch (*nextChar) { case 'n': *param = '\n'; break; case 't': *param = '\t'; break; case 'r': *param = '\r'; break; case 'f': *param = '\f'; break; case 'v': *param = '\v'; break; default: // handle all other \\ slash escapes eg \\ \] \" *param = *nextChar; break; } // switch } else { *param = *nextChar; } *(param+1) = 0; nextChar++; param++; } // while not delimiter if (*nextChar != delimiter) { printf("%s Warning: %s on line %s%d%s. \n", PURPLE, NORMAL, YELLOW, line, NORMAL); printf("%s >> %s%s %s\n", BLUE, GREEN, arg, NORMAL); printf(" No terminating %s%c%s character for parameter \n", YELLOW, delimiter, NORMAL); } if (nextChar != '\0') { nextChar++; } return nextChar; } // scanParameter /* print out the label table, just to make sure that it is getting built correctly. this table can be, and is, a property of the Program structure */ void printJumpTable(struct Label table[]) { int ii; printf("%s[Program label table]%s:\n", YELLOW, NORMAL); for (ii = 0; ii < 1000; ii++) { if (table[ii].text[0] == 0) break; printf("%s%15s:%s %d \n", BROWN, table[ii].text, NORMAL, table[ii].instructNumber); } } /* extracts the correct values for a parameter from a text argument it returns a pointer to the last character scanned in "args" or else NULL if there is no parameter. */ char * parameterFromText( FILE * file, struct Instruction * instruction, struct Parameter * param, char * text, int lineNumber, struct Label table[]) { char label[64] = ""; // possible jump label char charclass[20] = ""; // eg space, alnum, print int position; // calculate how many chars were scanned char * next; char * args = text; // here check if instruction is a jump (ie jumptrue/false/jump) // now check if it already has an integer argument. If not // check if 1st argument is a label. If so, convert to line number // using jumpTable[] and then convert to offset (line number - current // line/instruction number) if (args == '\0') { return args; } while(isspace((unsigned char)*args)) args++; // jumps only have one jump parameter. But this is a cludge // to suppress an error. if ((commandType(instruction->command) == JUMPS) && (instruction->a.datatype != INT)) { // format "jump 32" if (1 == sscanf(args, "%d%n", ¶m->number, &position)) { param->datatype = INT; return args + position; } // format "jump label" else if (1 == sscanf(args, "%s%n", label, &position)) { // for debugging see the label jump table // printJumpTable(table); // find the label, int jump = -1; jump = getJump(label, table); // printf(" \n", label, jump); if (jump == -1) { fprintf(file, "%s Parse Error: %s at instruction %s%d%s. \n", PURPLE, NORMAL, YELLOW, lineNumber, NORMAL); fprintf(file, "%s >> %s%s %s\n", BLUE, GREEN, args, NORMAL); fprintf(file, " Label '%s%s%s' not found. \n", YELLOW, label, NORMAL); fprintf(file, " Labels should start line and end with a \n" " colon (:). They are case sensitive. Jump targets should\n" " not include the colon. Labels cannot start with a number \n" " (because it is parsed as a line number).\n" " Type '%s%s%s' in interactive mode to see the label table\n", GREEN, testInfo[SHOWJUMPTABLE].names[0], NORMAL); return NULL; } else { //printf("command: %s \n", info[instruction->command].name); //printf("jump target=%d, linenumber=%d\n", jump, lineNumber); param->datatype = INT; if (instruction->command == JUMP) { param->number = jump; } else { param->number = jump - lineNumber; } } return args + position; } } // format "jump 32" // we need to use this position variable and return it // this is because we will call parameterFromText() again, to see // there are any more parameters. //int pos //(1 == sscanf(expression, "%lf%n", &value, &pos)) if (1 == sscanf(args, "%d%n", ¶m->number, &position)) { param->datatype = INT; return args + position; } // format "while [:space:] else if ((args[0] == '[') && (args[1] == ':')) { if (sscanf(args+2, "%20[^:]%n", charclass, &position) == 1) { if (textToClass(charclass) == NOCLASS) { fprintf(file, "%s Error: %s on line %s%d%s of source file. \n", PURPLE, NORMAL, YELLOW, lineNumber, NORMAL); fprintf(file, "%s >> %s%s %s\n", BLUE, GREEN, args, NORMAL); fprintf(file, " The character class %s%s%s is not valid \n", YELLOW, charclass, NORMAL); return NULL; } param->datatype = CLASS; param->classFn = classInfo[textToClass(charclass)].classFn; // advance scan position past [:text:] format return args + position + 4; } else { fprintf(file, "%s Error: %s on line %s%d%s. \n", PURPLE, NORMAL, YELLOW, lineNumber, NORMAL); fprintf(file, "%s >> %s%s %s\n", BLUE, GREEN, args, NORMAL); fprintf(file, " In argument %s%s%s, no character class given \n", YELLOW, args, NORMAL); return NULL; } } // format "whilenot [a-z] else if ((args[0] == '[') && (args[2] == '-')) { param->datatype = RANGE; param->range[0] = args[1]; param->range[1] = args[3]; return args+5; } else if (args[0] == '[') { // bracket delimited parameter param->datatype = LIST; next = scanParameter(param->list, args, lineNumber); return next; } else if (args[0] == '"') { // quote delimited parameter param->datatype = TEXT; next = scanParameter(param->text, args, lineNumber); return next; } else if (args[0] == '{') { // brace delimited parameter param->datatype = TEXT; next = scanParameter(param->text, args, lineNumber); return next; } // what sort of argument // print warnings if, for example, jump does not have an integer target // checkInstruction(instruction, stdout); return args; } /* fills out an instruction given valid text. Also handles unescaping of special characters and delimiter characters eg: add " this \r \t \" \: \\ " eg: while [ghi \];;\\ ] the line/instruction number parameter is for error messages returns -1 when an error occurs add struct label table[] jumptable parameter. */ int instructionFromText( FILE * file, struct Instruction * instruction, char * text, int lineNumber, struct Label table[]) { char command[1001] = ""; char args[1001] = ""; //char label[64] = ""; // possible jump label //char charclass[20] = ""; // eg space, alnum, print sscanf(text, "%1000s %2000[^\n]", command, args); //printf("parsed - command=%s, args=%s \n", command, args); if (command[0] == '\0') { fprintf(file, "%s Parse Error: %s on line %s%d%s \n", PURPLE, NORMAL, YELLOW, lineNumber, NORMAL); fprintf(file, "%s >> %s%s %s\n", BLUE, GREEN, text, NORMAL); fprintf(file, " Instruction command missing \n"); return -1; } if (textToCommand(command) == UNDEFINED) { fprintf(file, "%s Parse Error: %s on line %s%d%s. \n", PURPLE, NORMAL, YELLOW, lineNumber, NORMAL); fprintf(file, "%s >> %s%s %s\n", BLUE, GREEN, text, NORMAL); fprintf(file, " Command %s%s%s is not a valid instruction command \n", YELLOW, command, NORMAL); return -1; } instruction->command = textToCommand(command); if ((info[instruction->command].args > 0) && (args[0] == '\0')) { fprintf(file, "%s Error: %s on line %s%d%s of source file. \n", PURPLE, NORMAL, YELLOW, lineNumber, NORMAL); fprintf(file, "%s >> %s%s %s\n", BLUE, GREEN, text, NORMAL); fprintf(file, " Command %s%s%s requires %s%d%s argument(s) but none \n" " is given in the assembly file \n", YELLOW, command, NORMAL, YELLOW, info[instruction->command].args, NORMAL); } char * remainder = parameterFromText( file, instruction, &instruction->a, args, lineNumber, table); // also check here if the command/instruction actually needs // a second parameter. That should save time while compiling. if (instruction->a.datatype != UNSET) { //printf("args:%s \n", args); //printf("remainder args:%s \n", remainder); remainder = parameterFromText( file, instruction, &instruction->b, remainder, lineNumber, table); } if ((info[instruction->command].args == 2) && (instruction->b.datatype == UNSET)) { fprintf(file, "%s Error: %s on line %s%d%s of source file. \n", PURPLE, NORMAL, YELLOW, lineNumber, NORMAL); fprintf(file, "%s >> %s%s %s\n", BLUE, GREEN, text, NORMAL); fprintf(file, " Command %s%s%s requires %s%d%s argument(s) but none \n" " is given in the assembly file \n", YELLOW, command, NORMAL, YELLOW, info[instruction->command].args, NORMAL); return 1; } /* */ return 0; } /* just build an array with label line numbers returns the number of labels found */ int buildJumpTable(struct Label table[], FILE * file) { char buffer[1000]; char * line; int lineNumber; char text[2001]; char label[64]; // hold the asm label int ii = 0; // instruction number int ll = 0; // label number while (fgets(buffer, 999, file) != NULL) { // printf("%s", line); line = buffer; line[strlen(line) - 1] = '\0'; text[0] = '\0'; lineNumber = -1; // Trim leading space while (isspace((unsigned char)*line)) line++; // skip blank lines if (*line == 0) continue; // lines starting with # are comments // if (line[0] == '#') continue; // lines starting with # are 1 line comments // lines starting with #* are multiline comments (to *#) if (line[0] == '#') { if (strlen(line) == 1) continue; if (line[1] == '*') { line = line + 2; //printf("multiline comment! %s", line); // multiline comment #* ... need to search for next *# if(strstr(line, "*#") != NULL) { line = strstr(line, "*#") + 2; } else { // search next lines for *# while (fgets(buffer, 999, file) != NULL) { line = buffer; line[strlen(line) - 1] = '\0'; if (strstr(line, "*#") != NULL) { line = strstr(line, "*#") + 2; break; } } } } else continue; } sscanf(line, "%d: %2000[^\n]", &lineNumber, text); //debug: check that the arguments are getting parsed properly //printf("parsed - lineNumber=%d, text=%s \n", lineNumber, text); if (lineNumber == -1) { // if no line number, parse anyway sscanf(line, "%s", text); // only whitespace on line, so skip it if (text[0] == '\0') { continue; } sscanf(line, "%2000[^\n]", text); } // skip empty lines if (text[0] == '\0') { continue; } // handle assembler labels using the jumpTable // labels are lines ending in ':' if (text[strlen(text) - 1] == ':') { sscanf(text, "%64[^:]:", label); // printf("New Label '%s' at instruction %d \n", label, ii); table[ll].instructNumber = ii; strcpy(table[ll].text, label); ll++; // increment label number continue; } // increment instruction number ii++; } // while return ll; } /* load a program from an assembler listing in a text file uses compile() >> instructionFromText() or parseInstruction() maybe change this to ...(struct Machine * mm, ...) ??. The compiled instructions are inserted in the program at position "start". */ void loadAssembledProgram(struct Program * program, FILE * file, int start) { clearProgram(program); // this is not robust, we need to deal with very long // instructions using malloc (eg "add ") char buffer[1000]; char * line; int lineNumber; char text[2001]; clock_t startTime = 0; clock_t endTime = 0; //int ii = 0; int ii = start; // build a table of label line numbers // printf("Put %s%d%s labels into the jump table. \n", YELLOW, ll, NORMAL); // this returns the number of labels found buildJumpTable(program->labelTable, file); // printJumpTable(program->labelTable); rewind(file); //struct Instruction * instruction; while (fgets(buffer, 999, file) != NULL) { // printf("%s", line); line = buffer; line[strlen(line) - 1] = '\0'; text[0] = '\0'; lineNumber = -1; // Trim leading space while (isspace((unsigned char)*line)) line++; // skip blank lines if (*line == 0) continue; // lines starting with # are 1 line comments // lines starting with #* are multiline comments (to *#) if (line[0] == '#') { if (strlen(line) == 1) continue; if (line[1] == '*') { line = line + 2; //printf("multiline comment! %s", line); // multiline comment #* ... need to search for next *# if(strstr(line, "*#") != NULL) { line = strstr(line, "*#") + 2; } else { // search next lines for *# while (fgets(buffer, 999, file) != NULL) { line = buffer; line[strlen(line) - 1] = '\0'; if (strstr(line, "*#") != NULL) { line = strstr(line, "*#") + 2; break; } } } } else continue; } // skip label lines. improve this to trim spaces if (line[strlen(line)-1] == ':') continue; sscanf(line, "%d: %2000[^\n]", &lineNumber, text); //debug: check that the arguments are getting parsed properly //printf("parsed - lineNumber=%d, text=%s \n", lineNumber, text); if (lineNumber == -1) { // if no line number, parse anyway sscanf(line, "%s", text); // only whitespace on line, so skip it if (text[0] == '\0') { continue; } sscanf(line, "%2000[^\n]", text); } // empty line or just line number, just skip it if (*text == 0) continue; compile(program, text, ii, program->labelTable); ii++; } // while program->ip = 0; endTime = clock(); program->compileTime = (double)((endTime - startTime)*1000) / CLOCKS_PER_SEC; program->compileDate = time(NULL); /* printf("Compiled %s%d%s instructions from '%s%s%s' in about %s%ld%s milliseconds \n", CYAN, program->count, NORMAL, CYAN, program->source, NORMAL, CYAN, program->compileTime, NORMAL); */ } // check datatype of instruction etc int checkInstruction(struct Instruction * ii, FILE * file) { if ((info[ii->command].args > 0) && (ii->a.datatype == UNSET)) { fprintf(file, "error: missing argument for command \n"); } if ((commandType(ii->command) == JUMPS) && (ii->a.datatype != INT)) { //printf(file, "error: non integer \n"); } if ((info[ii->command].args == 0) && (ii->a.datatype != UNSET)) { fprintf(file, "warning: superfluous argument for command %s%s%s \n", YELLOW, info[ii->command].name, NORMAL); } switch (ii->command) { case ADD: if (ii->a.datatype != TEXT) { fprintf(file, "Error: ADD requires 'text' datatype \n"); } if (*ii->a.text == 0) { fprintf(file, "No text parameter for ADD \n"); } break; case CLIP: break; case CLOP: break; case CLEAR: break; case PRINT: break; case POP: break; case PUSH: break; case PUT: break; case GET: break; case SWAP: break; case INCREMENT: break; case DECREMENT: break; case READ: break; case UNTIL: case WHILE: case WHILENOT: break; case JUMP: // validate all jump targets... is target undefined, in // range of program etc etc. if (ii->a.datatype != INT) { fprintf(file, "error: Non integer target for jump instruction"); } break; case JUMPTRUE: if (ii->a.datatype != INT) { fprintf(file, "error: Non integer target for jump instruction"); } break; case JUMPFALSE: if (ii->a.datatype != INT) { fprintf(file, "error: Non integer target for jump instruction"); } break; case TESTIS: if (ii->a.datatype != TEXT) { fprintf(file, "wrong datatype for parameter for TESTIS \n"); } break; case TESTBEGINS: if (ii->a.datatype != TEXT) { fprintf(file, "wrong datatype for parameter for TESTBEGINS \n"); } break; case TESTENDS: if (ii->a.datatype != TEXT) { fprintf(file, "wrong datatype for parameter for TESTENDS \n"); } break; case TESTEOF: break; case TESTTAPE: break; case COUNT: break; case INCC: break; case DECC: break; case ZERO: break; case CHARS: break; case STATE: break; case QUIT: break; case WRITE: break; case NOP: break; case UNDEFINED: break; default: break; } // switch return -1; } // checkInstruction /* check whether an instruction has the right sort of parameters etc. It is better to do this once before a program is executed rather than in the "execute" routine. Returns positive integer if the instruction is not valid s should be "validateProgram" in order to check jump targets etc. And also to make sure jumptrue, jumpfalse have a test before them. check for infinite loops. Check if there is a branch zero at the end. Check if there is a read at the begining */ int validateProgram(struct Program * program) { struct Instruction * instruction; int ii; int warnings = 0; int errors = 0; for (ii = 0; ii < program->count; ii++) { instruction = &program->listing[ii]; if (ii == program->count-1) { if ((instruction->command != JUMP) || (instruction->a.datatype != INT) || (instruction->a.number != 0)) { numberedInstruction(instruction, ii, stdout); fprintf(stdout, "Normally the last instruction of the program \n" "should be an unconditional jump to the first instruction \n" "eg: jump 0 \n" "This is because the parse machine is designed to run in a \n" "loop for each character read (somewhat similar to sed, except \n" "for characters, not lines \n"); warnings++; } } if (ii == 0) { if (instruction->command != READ) { numberedInstruction(instruction, ii, stdout); fprintf(stdout, "Normally the first instruction of the program \n" "should be a READ, which reads one character from the \n" "input source \n"); warnings++; } } } // for loop printf( "Checked %s%d%s instructions and found %s%d%s errors and %s%d%s warnings \n", BLUE, ii, NORMAL, YELLOW, errors, NORMAL, YELLOW, warnings, NORMAL); return 0; } // the virtual machine which parses struct Machine { FILE * inputstream; //source of characters int peep; // next char in the stream, may have EOF struct Buffer buffer; //workspace & stack struct Tape tape; int accumulator; // used for counting long charsRead; // how many characters read from input stream long lines; // how many lines already read from input stream enum Bool flag; // used for tests char delimiter; // eg *, to separate tokens on the stack char escape; // escape character struct Program program; // compiled instructions + ip counter char version[64]; // machine version string (eg: "0.1 campania" etc) }; // all functions relating to the machine // initialise the machine with an input stream void newMachine(struct Machine * machine, FILE * input, int tapeCells, int cellSize) { // set the input stream // read the first character into peep?? machine->inputstream = input; machine->charsRead = 0; machine->lines = 0; machine->accumulator = 0; newTape(&machine->tape, tapeCells, cellSize); newBuffer(&machine->buffer, 40); newProgram(&machine->program, 1500); machine->peep = fgetc(machine->inputstream); machine->flag = FALSE; machine->escape = '\\'; machine->delimiter = '*'; strcpy(machine->version, "0.1 campania"); } // reset the machine without destroying program void resetMachine(struct Machine * machine) { // rewind the input stream rewind(machine->inputstream); machine->charsRead = 0; machine->lines = 0; machine->accumulator = 0; clearTape(&machine->tape); resetBuffer(&machine->buffer); // read the first character into peep machine->peep = fgetc(machine->inputstream); machine->flag = FALSE; machine->escape = '\\'; machine->delimiter = '*'; machine->program.ip = 0; } // free all memory associated with the machine. void freeMachine(struct Machine * mm) { freeTape(&mm->tape); freeProgram(&mm->program); free(mm->buffer.stack); return; } // read one character from the input stream and update // the peep and workspace and other machine registers // this fuction is used in commands while, whilenot, until, read etc // returns 0 if end of stream is reached void showStackWorkPeep(struct Machine * mm, enum Bool escape); int readc(struct Machine * mm) { if (mm->peep == EOF) return 0; // not the problem //if (feof(mm->inputstream)) return 0; //if (ferror(mm->inputstream)) return 0; if (strlen(mm->buffer.stack) == mm->buffer.capacity - 2) { growBuffer(&mm->buffer, 100); } // this was a bug when it was placed before growBuffer() char * lastc = mm->buffer.workspace + strlen(mm->buffer.workspace); *lastc = mm->peep; *(lastc + 1) = '\0'; mm->peep = fgetc(mm->inputstream); mm->charsRead++; // start line counting at 1, because that is what is expected if (mm->lines == 0) { mm->lines = 1; } if (mm->peep == '\n') { mm->lines++; } return 1; } /* display some meta-information about the machine such as version, capacites etc */ void printMachineMeta(struct Machine * machine) { char Colour[30]; char date[30]; char time[30]; strcpy(Colour, BROWN); strcpy(date, "?"); strcpy(time, "?"); printf("%s Machine Version:%s %s \n", Colour, NORMAL, machine->version); printf("%s Character read:%s %ld \n", Colour, NORMAL, machine->charsRead); printf("%s Escape character:%s %c \n", Colour, NORMAL, machine->escape); printf("%s Peep character: %c %s \n", Colour, machine->peep, NORMAL); // cant really use escapeSpecial here, because it converts text, not // one character... // escapeSpecial(, PINK); printf("\n"); printf("%s Stack token delimiter:%s %c \n", Colour, NORMAL, machine->delimiter); /* printf("%s Flag:%s %d \n", Colour, NORMAL, machine->flag); if (machine->flag == FALSE) { //strcpy(date, ctime(&machine->compileDate)); } */ } void printBufferAndPeep(struct Machine * mm) { printf("[%ld] Buff:%s Peep:%c \n", mm->buffer.capacity, mm->buffer.stack, mm->peep ); } void showBufferAndPeep(struct Machine * mm) { char peep[20]; if (mm->peep == EOF) { strcpy(peep, "EOF"); } else if (mm->peep == '\n') { strcpy(peep, "\\n"); } else if (mm->peep == '\r') { strcpy(peep, "\\r"); } else { peep[0] = mm->peep; peep[1] = '\0'; } printf("[%s%ld%s] Buff[%s%s%s] P[%s%s%s] \n", GREEN, mm->buffer.capacity, NORMAL, YELLOW, mm->buffer.stack, NORMAL, BLUE, peep, NORMAL ); } /* show core registers of the parsing machine: the stack, the "workspace" (like a text accumulator), and the "peep" (the next character in the input stream */ void showStackWorkPeep(struct Machine * mm, enum Bool escape) { char peep[20]; if (mm->peep == EOF) { strcpy(peep, "EOF"); } else if (mm->peep == '\n') { strcpy(peep, "\\n"); } else if (mm->peep == '\r') { strcpy(peep, "\\r"); } else { peep[0] = mm->peep; peep[1] = '\0'; } // uses the %.*s, len, buffer trick to print the stack int stackwidth = mm->buffer.workspace - mm->buffer.stack; if (escape == TRUE) { printf("%s(Buff:%s%ld/%ld +r:%d%s)%s Stack[%s%.*s%s] Work[%s", WHITE, BROWN, strlen(mm->buffer.stack), mm->buffer.capacity, mm->buffer.resizings, WHITE, NORMAL, YELLOW, stackwidth, mm->buffer.stack, NORMAL, PURPLE); escapeSpecial(mm->buffer.workspace, CYAN); printf("%s] Peep[%s%s%s] \n", NORMAL, BLUE, peep, NORMAL ); } else { // uses the %.*s, len, buffer trick to print the stack printf("%s(Buff:%s%ld%s)%s Stack[%s%.*s%s] Work[%s%s%s] Peep[%s%s%s] \n", WHITE, BROWN, mm->buffer.capacity, WHITE, NORMAL, YELLOW, stackwidth, mm->buffer.stack, NORMAL, PURPLE, mm->buffer.workspace, NORMAL, BLUE, peep, NORMAL ); } } /* display the state of registers of the machine using colours This does not show the tape (which is part of the machine) or the program which is also part of the machine. See showMachineTapeProgram() or showMachineWithTape() for that. The "escape" parameter determines if whitespace (newlines, carriage returns, tabs etc) will be displayed in the format "\n \r \t" etc, or just printed normally. If there are newlines in the workspace then the display gets messy. So its just a matter of aesthetics. */ void showMachine(struct Machine * mm, enum Bool escape) { showStackWorkPeep(mm, escape); printf("Acc:%s%d%s Flag:%s%s%s Esc:%s%c%s " "Delim:%s%c%s Chars:%s%ld%s Lines:%s%ld%s \n", GREEN, mm->accumulator, NORMAL, YELLOW, (mm->flag==0?"TRUE":"FALSE"), NORMAL, YELLOW, mm->escape, NORMAL, YELLOW, mm->delimiter, NORMAL, BLUE, mm->charsRead, NORMAL, BLUE, mm->lines, NORMAL); } // show state of machine buffers with tape cells void showMachineWithTape(struct Machine * mm) { printf("%s--------- Machine State -----------%s \n", BROWN, NORMAL); showMachine(mm, TRUE); printf("%s--------- Tape --------------------%s \n", BROWN, NORMAL); // true means: escape special character (newlines mainly) printSomeTapeInfo(&mm->tape, TRUE, 2); } // show state of machine buffers with tape cells void showMachineTapeProgram(struct Machine * mm, int tapeContext) { printSomeProgram(&mm->program, 3); printf("%s--------- Machine State -----------%s \n", PURPLE, NORMAL); showMachine(mm, TRUE); printf("%s--------- Tape --------------------%s \n", PURPLE, NORMAL); printSomeTapeInfo(&mm->tape, TRUE, tapeContext); } /* this is mainly for debugging. It shows the capacity of the buffer and its string length */ void printBufferCapacity(struct Machine * mm) { //incomplete printf("buff cap:"); } enum ExitCode execute(struct Machine * mm, struct Instruction * ii) { struct TapeCell * thisCell; long newCapacity; FILE * saveFile; // where workspace is written by 'write' command char * temp; // a temporary string for x swaps char acc[100]; // a text version of the accumulator char * buffer; // store the workspace when escaping (needs to be malloc) size_t len; int count; // count escapable chars for malloc char * lastc; // points to last char in workspace char * lastw; // points to last char in workspace int (* fn)(int); // a function pointer for the ctype.h functions size_t cellLength; // how long tapecell text is int difference = 0; // substring size difference for "replace" switch (ii->command) { case ADD: if (strlen(mm->buffer.stack) + strlen(ii->a.text) > mm->buffer.capacity) { growBuffer(&mm->buffer, strlen(ii->a.text) + 50); } strcat(mm->buffer.workspace, ii->a.text); break; case CLIP: if (*mm->buffer.workspace == 0) break; mm->buffer.workspace[strlen(mm->buffer.workspace)-1] = '\0'; break; case CLOP: if (*mm->buffer.workspace == 0) break; len = strlen(mm->buffer.workspace); memmove(mm->buffer.workspace, mm->buffer.workspace+1, len-1); mm->buffer.workspace[len-1] = 0; break; case CLEAR: mm->buffer.workspace[0] = '\0'; break; case REPLACE: difference = strlen(ii->b.text) - strlen(ii->a.text); // but growBuffer takes an increase, not a minimum size size_t newSize = strlen(mm->buffer.workspace) + difference; char * result = malloc((newSize + 100) * sizeof(char)); *result = 0; replaceString(result, mm->buffer.workspace, ii->a.text, ii->b.text); if (newSize > workspaceCapacity(&mm->buffer)) { growBuffer(&mm->buffer, difference + 100); } strcpy(mm->buffer.workspace, result); free(result); break; case PRINT: printf("%s", mm->buffer.workspace); break; case POP: // pop a token from the stack, so skip the first delim * and // read back to the next delim * in the stack buffer. // // this pop routine seems unnecessary complicated. // basically given s:a*b* w: // pop should give s:a* w:b* // if (mm->buffer.workspace == mm->buffer.stack) break; mm->buffer.workspace--; if (mm->buffer.workspace == mm->buffer.stack) { if (mm->tape.currentCell > 0) mm->tape.currentCell--; break; } while ((*(mm->buffer.workspace-1) != mm->delimiter) && (mm->buffer.workspace-1 != mm->buffer.stack)) mm->buffer.workspace--; if (mm->buffer.workspace == mm->buffer.stack) { if (mm->tape.currentCell > 0) mm->tape.currentCell--; break; } if (*(mm->buffer.workspace-1) != mm->delimiter) mm->buffer.workspace--; // dec current tape cell if (mm->tape.currentCell > 0) mm->tape.currentCell--; break; case PUSH: // could use strchr instead, or better strchrnul // but strchrnul is not standard C. if (mm->buffer.workspace[0] == '\0') break; while ((*mm->buffer.workspace != '\0') && (*mm->buffer.workspace != mm->delimiter)) mm->buffer.workspace++; if (mm->buffer.workspace[0] == mm->delimiter) mm->buffer.workspace++; // increment current tape cell // star at end of token - not beginning // look for limits !! if (mm->tape.currentCell < mm->tape.capacity) mm->tape.currentCell++; else { printf("Push: Out of tape bounds"); exit(1); } break; case PUT: // I could make this a function, but the only place the // tape cell can get resized is here in the PUT command // if not enough space in tape cell, malloc here thisCell = &mm->tape.cells[mm->tape.currentCell]; if (strlen(mm->buffer.workspace) > thisCell->capacity) { newCapacity = strlen(mm->buffer.workspace)+100; thisCell->text = malloc(newCapacity * sizeof(char)); if (thisCell->text == NULL) { fprintf(stderr, "PUT: couldnt allocate memory for cell->text (execute) \n"); exit(EXIT_FAILURE); } thisCell->capacity = newCapacity - 1; thisCell->resizings++; } strcpy(thisCell->text, mm->buffer.workspace); break; case GET: cellLength = strlen(mm->tape.cells[mm->tape.currentCell].text); if ((strlen(mm->buffer.stack) + cellLength) > mm->buffer.capacity) { growBuffer(&mm->buffer, cellLength + 100); } strcat(mm->buffer.workspace, mm->tape.cells[mm->tape.currentCell].text); break; case SWAP: temp = malloc(strlen(mm->buffer.workspace + 10) * sizeof(char)); // ... todo! implement this strcpy(temp, mm->buffer.workspace); cellLength = strlen(mm->tape.cells[mm->tape.currentCell].text); // this is a bug, we need a function mm.workspaceCapacity() // because some room in the buffer is taken up by the stack // and we dont know how long that is because the stack is // not zero terminated (it is combined with the workspace). if (cellLength > workspaceCapacity(&mm->buffer)) { growBuffer(&mm->buffer, cellLength + 20); } strcpy(mm->buffer.workspace, mm->tape.cells[mm->tape.currentCell].text); thisCell = &mm->tape.cells[mm->tape.currentCell]; if (strlen(temp) > thisCell->capacity) { newCapacity = strlen(temp)+20; thisCell->text = malloc(newCapacity * sizeof(char)); if (thisCell->text == NULL) { fprintf(stderr, "PUT: couldnt allocate memory for cell->text (execute) \n"); exit(EXIT_FAILURE); } thisCell->capacity = newCapacity - 1; thisCell->resizings++; } strcpy(thisCell->text, temp); free(temp); break; case INCREMENT: if (mm->tape.currentCell < mm->tape.capacity) mm->tape.currentCell++; else { printf("++: Out of tape bounds"); exit(1); } break; case DECREMENT: if (mm->tape.currentCell > 0) mm->tape.currentCell--; break; case READ: if (mm->peep == EOF) { return ENDOFSTREAM; // end of file } readc(mm); break; case UNTIL: // until workspace ends with ii->a.text // this seems to be working. if (mm->peep == EOF) break; if (!readc(mm)) break; len = strlen(ii->a.text); size_t worklen = strlen(mm->buffer.workspace); // below is a general "endswith" function... // strcmp(mm->buffer.workspace+worklen-len, ii->a.text) != 0) { // this should not be necessary because buffer grows in // readc !!!!! /* if (strlen(mm->buffer.stack) == mm->buffer.capacity - 1) { growBuffer(&mm->buffer, 100); } */ char * suffix = mm->buffer.workspace+worklen-len; while (readc(mm)) { // this should not be necessary because buffer grows in // readc !!!!! /* if (strlen(mm->buffer.stack) == mm->buffer.capacity - 1) { growBuffer(&mm->buffer, 100); } */ worklen++; suffix = mm->buffer.workspace+worklen-len; // if ((strcmp(suffix, ii->a.text) == 0) && (*(suffix-1) != '\\')) { // deal with escape character. if ((strcmp(suffix, ii->a.text) == 0) && (*(suffix-1) != mm->escape)) { // printf("%s\n", suffix); break; } } break; case WHILE: if (mm->peep == EOF) break; // while and whilenot handle classes eg :space: ranges eg [a-z] // and lists eg [abxy] [.] etc // a function pointer: fn = &isblank; int res = (*fn)('1'); if (ii->a.datatype == CLASS) { // get character class function pointer from the instruction fn = ii->a.classFn; while ((*fn)(mm->peep)) { if (!readc(mm)) break; } } else if (ii->a.datatype == RANGE) { // compare peep to a range of characters eg a-z while ((mm->peep >= ii->a.range[0]) && (mm->peep <= ii->a.range[1])) { if (!readc(mm)) break; } } else if (ii->a.datatype == LIST) { // read input while peep is in a list of chars while (strchr(ii->a.list, mm->peep) != NULL) { if (!readc(mm)) break; } } break; case WHILENOT: // do we really need a whilenot. We could just write // while ![:space:] etc // why not use a switch here??? Its a switch within a switch... /* switch (ii->a.datatype) { case CLASS: break; case RANGE: break; } */ if (ii->a.datatype == CLASS) { // get character class function pointer from the instruction fn = ii->a.classFn; while (!(*fn)(mm->peep)) { if (!readc(mm)) break; } } else if (ii->a.datatype == RANGE) { // read input while peep is not in a range (eg b-f) while ((mm->peep < ii->a.range[0]) || (mm->peep > ii->a.range[1])) { if (!readc(mm)) break; } } else if (ii->a.datatype == LIST) { // read input while peep is not in a char list eg "abxy" while (strchr(ii->a.list, mm->peep) == NULL) { if (!readc(mm)) break; } } break; case JUMP: // update program counter. non-relative jump mm->program.ip = ii->a.number; break; case JUMPTRUE: if (mm->flag == TRUE) // relative jump. easier to assemble code // mm->program.ip = ii->a.number; mm->program.ip = mm->program.ip + ii->a.number; else mm->program.ip++; break; case JUMPFALSE: if (mm->flag == FALSE) // relative jump. mm->program.ip = mm->program.ip + ii->a.number; else mm->program.ip++; break; case TESTIS: if (strcmp(mm->buffer.workspace, ii->a.text) == 0) { mm->flag = TRUE; } else { mm->flag = FALSE; } break; case TESTCLASS: // handle class, range, text, etc // depending on the parameter type. if (ii->a.datatype == CLASS) { // get character class function pointer from the instruction fn = ii->a.classFn; lastc = mm->buffer.workspace; //while ((*fn)(mm->peep)) { mm->flag = TRUE; while (*lastc != 0) { if (!fn(*lastc)) { mm->flag = FALSE; break; } lastc++; } } else if (ii->a.datatype == RANGE) { // compare ws to a range of characters eg a-z mm->flag = TRUE; lastc = mm->buffer.workspace; while (*lastc != 0) { if ((*lastc < ii->a.range[0]) || (*lastc > ii->a.range[1])) { mm->flag = FALSE; break; } lastc++; } } else if (ii->a.datatype == LIST) { // compare ws to a list of chars //while (strchr(ii->a.list, mm->peep) != NULL) { mm->flag = TRUE; lastc = mm->buffer.workspace; while (*lastc != 0) { if (strchr(ii->a.list, *lastc) == NULL) { mm->flag = FALSE; break; } lastc++; } } break; case TESTBEGINS: if (strncmp(mm->buffer.workspace, ii->a.text, strlen(ii->a.text)) == 0) { mm->flag = TRUE; } else { mm->flag = FALSE; } break; case TESTENDS: if (strcmp(mm->buffer.workspace + strlen(mm->buffer.workspace) - strlen(ii->a.text), ii->a.text) == 0) mm->flag = TRUE; else mm->flag = FALSE; break; case TESTEOF: if (mm->peep == EOF) { mm->flag = TRUE; } else { mm->flag = FALSE; } break; case TESTTAPE: if (strcmp(mm->buffer.workspace, mm->tape.cells[mm->tape.currentCell].text) == 0) { mm->flag = TRUE; } else { mm->flag = FALSE; } break; case COUNT: sprintf(acc, "%d", mm->accumulator); if (strlen(mm->buffer.stack) + strlen(acc) > mm->buffer.capacity) { growBuffer(&mm->buffer, strlen(acc) + 50); } strcat(mm->buffer.workspace, acc); break; case INCC: mm->accumulator++; break; case DECC: mm->accumulator--; break; case ZERO: mm->accumulator = 0; break; case CHARS: sprintf(acc, "%ld", mm->charsRead); if (strlen(mm->buffer.stack) + strlen(acc) > mm->buffer.capacity) { growBuffer(&mm->buffer, strlen(acc) + 50); } strcat(mm->buffer.workspace, acc); break; case LINES: sprintf(acc, "%ld", mm->lines); if (strlen(mm->buffer.stack) + strlen(acc) > mm->buffer.capacity) { growBuffer(&mm->buffer, strlen(acc) + 50); } strcat(mm->buffer.workspace, acc); break; case ESCAPE: // count escapable // but if the escapable is already preceded with a // backslash should we reescape it count=0; lastc = strchr(mm->buffer.workspace, ii->a.text[0]); while (lastc != NULL) { count++; lastc = strchr(lastc+1, ii->a.text[0]); } buffer = malloc((strlen(mm->buffer.workspace)+count+10) * sizeof(char)); // also grow workspace if needed. strcpy(buffer, mm->buffer.workspace); lastw = mm->buffer.workspace; while (*buffer != 0) { if (*buffer == ii->a.text[0]) { *lastw = mm->escape; lastw++; } *lastw = *buffer; buffer++; lastw++; } *lastw = 0; break; case UNESCAPE: /* how should this work? should unescape deescape all backlash letters or only one or a list, or a class :blank: or a range [a-z] */ lastc = mm->buffer.workspace; lastw = mm->buffer.workspace; while (*lastc != 0) { if ((*lastc == mm->escape) && (*(lastc+1)==ii->a.text[0])) { lastc++; } *lastw = *lastc; lastc++; lastw++; } *lastw = 0; // ... break; case STATE: showMachineTapeProgram(mm, 3); break; case QUIT: // return EXECQUIT; // script must now exit break; case BAIL: // return BADQUIT; // script must now exit with error code break; case WRITE: // write workspace to file 'sav.pp' if ((saveFile = fopen("sav.pp", "w")) == NULL) { printf ("Cannot open file %s'sav.pp'%s for writing\n", YELLOW, NORMAL); return WRITESAVERROR; } fputs(mm->buffer.workspace, saveFile); fclose(saveFile); break; case NOP: break; case UNDEFINED: fprintf(stderr, "Executing undefined command! " "at instruction: %d \n ", mm->program.ip); return EXECUNDEFINED; // error code break; } // if a jump command, dont increment the instruction pointer if (strncmp(info[ii->command].name, "jump", 4) != 0) { mm->program.ip++; } return SUCCESS; } // steps through one instruction of the machines program void step(struct Machine * mm) { //int result = execute(mm, &mm->program.listing[mm->program.ip]); } /* runs the compiled program in the machine but this will exit when the last read is performed... execute() has the following exit codes which we might need to handle: 0: success no problems 1: end of stream reached (tried to read eof) 2: trying to execute undefined instruction 3: quit/crash command executed (exit script) */ enum ExitCode run(struct Machine * mm) { int result; for (;;) { result = execute(mm, &mm->program.listing[mm->program.ip]); if (result != 0) break; } return result; } void runDebug(struct Machine * mm) { int result; long ii = 0; // a counter for (;;) { printf("%6ld: ip=%3d T(n)=%3d", ii, mm->program.ip, mm->tape.currentCell); printInstruction(&mm->program.listing[mm->program.ip]); result = execute(mm, &mm->program.listing[mm->program.ip]); printf("\n"); if (result != 0) { printf("execute() returned error code (%d)\n", result); break; } ii++; } } /* another debugging tool: run while the number of characters read by the machine is less than maximum */ void runWhileCharsLessThan(struct Machine * mm, long maximum) { int result; while ((mm->peep != EOF) && (mm->charsRead < maximum)) { result = execute(mm, &mm->program.listing[mm->program.ip]); if (result != 0) break; } } /* another debugging tool: run until the input stream line number is equal to the number */ void runToLine(struct Machine * mm, long maximum) { int result; while ((mm->peep != EOF) && (mm->lines < maximum)) { result = execute(mm, &mm->program.listing[mm->program.ip]); if (result != 0) break; } } // runs the compiled program in the machine until the // flag register is set to true void runUntilTrue(struct Machine * mm) { int result; while (mm->flag == FALSE) { result = execute(mm, &mm->program.listing[mm->program.ip]); if (result != 0) break; } } /* runs the compiled program in the machine until the workspace is exactly the specified text */ void runUntilWorkspaceIs(struct Machine * mm, char * text) { int result; while ((mm->peep != EOF) && (strcmp(mm->buffer.workspace, text) != 0)) { result = execute(mm, &mm->program.listing[mm->program.ip]); if (result != 0) break; } } int endsWith(const char *str, const char *suffix) { if (!str || !suffix) return 0; size_t lenstr = strlen(str); size_t lensuffix = strlen(suffix); if (lensuffix > lenstr) return 0; return strncmp(str + lenstr - lensuffix, suffix, lensuffix) == 0; } int startsWith(const char * text, const char * prefix) { return strncmp(text, prefix, strlen(prefix)); } /* runs the compiled program until the workspace is ends with the specified text */ void runUntilWorkspaceEndsWith(struct Machine * mm, char * text) { int result; while ((mm->peep != EOF) && (!endsWith(mm->buffer.workspace, text))) { result = execute(mm, &mm->program.listing[mm->program.ip]); if (result != 0) break; } } // given some instruction text (eg: add "this") compile an instruction into // the machines program listing. Returns zero on success or a positive integer // if arguments are invalid. maybe this should return a pointer to an // instruction upon success the job of compile is to housekeep the machine. // the job of instructionFromText() is actually to parse the text into an // instruction // change this to struct Program * pp ?? or change // loadAssembledProgram to take a machine int compile(struct Program * program, char * text, int pos, struct Label table[]) { struct Instruction * ii; char command[200]; char args[1000]; // pos is where in the program list to put the compiled instruction ii = &program->listing[pos]; if (program->capacity == program->count - 1) { growProgram(program, 50); } // int result = sscanf(text, "%200s %200[^\n]", command, args); enum Command com = textToCommand(command); if (com == UNDEFINED) { printf("Unknown command name %s%s%s \n", BROWN, command, NORMAL); printf("on line: %s%s%s at program position %d \n", BROWN, text, NORMAL, pos); return 1; } instructionFromText(stdout, ii, text, pos, table); program->count++; return 0; // 2 argument compilation... to do } // compile // given user input text return the test command enum TestCommand textToTestCommand(const char * text) { int ii; if (*text == 0) return UNKNOWN; for (ii = 0; ii < UNKNOWN; ii++) { if ((strcmp(text, testInfo[ii].names[0]) == 0) || (strcmp(text, testInfo[ii].names[1]) == 0)) { return (enum TestCommand)ii; } } return UNKNOWN; } void showTestHelp() { int ii; int key; printf("%s[ALL HELP COMMANDS]%s:\n", YELLOW, NORMAL); for (ii = 0; ii < UNKNOWN; ii++) { printf("%s%5s%s %s %s-%s %s%s \n", GREEN, testInfo[ii].names[0], PALEGREEN, testInfo[ii].argText, NORMAL, WHITE, testInfo[ii].description, NORMAL); if ( (ii+1) % 14 == 0 ) { pagePrompt(); key = getc(stdin); if (key == 'q') { return; } } } } // information about different modes the test program can be in // eg step where enter steps through the next instruction. // Or maybe 2 mode variables enterMode and programMode. enterMode // determines what the key does, and programMode determines // whether instructions are compiled or executed or both enum TestMode { COMPILE=0, IPCOMPILE, ENDCOMPILE, INTERPRET, MACHINE, PROGRAM, STEP, RUN } mode; // contain information about all commands struct { enum TestMode mode; char * name; char * description; } modeInfo[] = { { COMPILE, "compile", "enter displays the compiled program. Instructions entered are \n" "compiled into the machines program listing, but are not executed "}, { IPCOMPILE, "ipcompile", "Instructions are compiled into the machines program listing \n" "at the current instruction pointer position instead of at \n" "the end of the program. This is a slightly clunky way of \n" "modifying the in-memory program, but possibly easier than trying \n" "to manually modify the text assembler listing. "}, { ENDCOMPILE, "endcompile", "Instructions are compiled into the machines program listing \n" "at the end of the program \n" }, { INTERPRET, "interpret", "Instructions entered are executed but not compiled into the \n" "machine's program. \n" }, { MACHINE, "machine", "enter displays the state of the machine"}, { PROGRAM, "", ""}, { STEP, "step", "When enter is pressed the next instruction is executed and " "the state of the machine is displayed "}, { RUN, "run", "When enter is pressed the current program is run. "} }; // searches testInfo array for commands matching a search term void searchHelp(char * text) { int ii; int jj = 0; char key; printf("%s[Searching help commands]%s:\n", YELLOW, NORMAL); for (ii = 0; ii < UNKNOWN; ii++) { if ((strstr(testInfo[ii].description, text) != NULL) || (strstr(testInfo[ii].names[1], text) != NULL) || (strstr(testInfo[ii].names[0], text) != NULL)) { jj++; printf("%s%5s%s %s %s-%s %s%s \n", PURPLE, testInfo[ii].names[0], YELLOW, testInfo[ii].argText, NORMAL, BROWN, testInfo[ii].description, NORMAL); if ( (jj+1) % 14 == 0 ) { pagePrompt(); key = getc(stdin); if (key == 'q') { return; } } } // if search term found } // for if (jj == 0) { printf("No results found for '%s'\n", text); } } /* load a script from file and install in the machines program at instruction number "position". This allows us to append one script at the end of another which may be useful. */ enum ExitCode loadScript( struct Program * program, FILE * scriptFile, int position) { /* the procedure is: load asm.pp, run it on the script-file (assembly is saved to sav.pp) load sav.pp, run it on the input stream. */ FILE * asmFile; if ((asmFile = fopen("asm.pp", "r")) == NULL) { printf("Could not open assembler %sasm.pp%s \n", YELLOW, NORMAL); return 1; } FILE * savFile; // first delete contents of the sav.pp file to avoid confusion // later. if ((savFile = fopen("sav.pp", "w")) == NULL) { printf("Could not open script file %s'sav.pp'%s for writing \n", YELLOW, NORMAL); return(1); } fputs("add \"no script\" \n", savFile); fputs("quit \n", savFile); fclose(savFile); struct Machine new; newMachine(&new, scriptFile, 100, 10); loadAssembledProgram(&new.program, asmFile, 0); int result = 0; result = run(&new); /* check if the compilation was successful. (which means ExitCode SUCCESS of EXECQUIT) If not successful, do not proceed. because the script file was not properly compiled by asm.pp */ if (result > EXECQUIT) { // try to give a more informative error message here showMachineTapeProgram(&new, 3); } //runDebug(&m); fclose(asmFile); freeMachine(&new); if (result > EXECQUIT) { return result; } // asm.pp has created a new sav.pp file, which is the // script in a "compiled" form (a type of assembly language // for the parse virtual machine. We can now open, load // it and run it. if ((savFile = fopen("sav.pp", "r")) == NULL) { printf("Could not open script file %s'sav.pp'%s for reading. \n", YELLOW, NORMAL); return READSAVERROR; } loadAssembledProgram(program, savFile, 0); return SUCCESS; } void printUsageHelp() { fprintf(stdout, "'pp' a pattern parsing machine \n" "Usage: pp [-shI] [-i 'text'] [-e 'snippet'] [-f script-file] [-a file] [inputfile] \n" " \n" " -f script-file text input file \n" " -e expression add inline script commands to script \n" " -i text use 'text' as input for the script \n" " -h print this help \n" " -s run in unix filter mode (the default).\n" " -a [file] load script-assembly source file \n" " -I run in interactive mode (with shell) \n"); //" -M print the state of the machine after compiling \n" //" a script. This option is useful for debugging. \n" } // The main testing loop for the machine. This accepts interactive // commands and executes them, among other things. Allows the state // of the machine to be observed, including the compiled program. int main (int argc, char *argv[]) { int c; // name of the file with script assembly commands char * asmFileName = NULL; // name of file to serve as input stream char * inputFile = NULL; // name of the file with parse script commands char * scriptFileName = NULL; // script commands on the command line with the -e switch char * inlineScript = NULL; // inline input (instead of an input file) char * inlineInput = NULL; char source[64] = "input.txt"; // if asm.pp generate an error, then the machine parse state will // be printed. enum Bool printState = FALSE; /* filtermode means the program is used as a unix filter, like sed, grep etc intermode means the program starts an interactive shell, so that the user can debug scripts and learn about the machine */ enum { FILTERMODE, INTERMODE } progmode = FILTERMODE; // determines how much of the tape will be shown by printSomeTapeInfo() // and showMachineTapeProgram() int tapeContext = 3; /* //A mystery, this takes input from a pipe in test.stdin.c but //not here, only looks at console char word[128]; FILE * fp = stdin; while (fscanf(fp, "%127s", word) == 1) { printf("%s\n", word); } exit(0); */ opterr = 0; while ((c = getopt (argc, argv, "i:f:e:IsMha:")) != -1) { switch (c) { // a text input file, but this should be a non-option argument // (just the file name). case 'f': scriptFileName = optarg; break; case 'e': inlineScript = optarg; break; case 'i': inlineInput = optarg; break; case 'I': progmode = INTERMODE; break; case 's': progmode = FILTERMODE; break; case 'M': printState = TRUE; break; case 'h': printUsageHelp(); break; case 'a': asmFileName = optarg; break; case '?': switch (optopt) { case 'a': fprintf (stderr, "Option -%c requires an argument.\n", optopt); fprintf (stderr, " -a scriptAssemblyFile \n"); break; case 'f': fprintf ( stderr, "%s: option -%c requires an argument.\n", argv[0], optopt); break; case 'e': case 'i': fprintf ( stderr, "%s: option -%c requires an argument.\n", argv[0], optopt); break; default: fprintf (stderr, "%s: unknown option -- %c \n", argv[0], optopt); break; } printUsageHelp(); exit(1); default: abort (); } } // while if (argv[optind] != NULL) { inputFile = argv[optind]; } if ((asmFileName != NULL) && (scriptFileName != NULL)) { printf("cannot load assembly and script at the same time (-a and -f)"); printUsageHelp(); exit(1); } char version[64] = "v31415"; if (progmode != FILTERMODE) { banner() ; printf("Compiled: %s%s%s \n", YELLOW, version, NORMAL); } struct Machine m; //struct Instruction i; if (inputFile != NULL) { strcpy(source, inputFile); } if ((inlineInput == NULL) && (inputFile == NULL)) { fprintf(stdout, "No input given to script (use -i or inputFile)\n"); printUsageHelp(); exit(1); } if ((inlineInput != NULL) && (inputFile != NULL)) { fprintf(stdout, "cannot use -i switch and inputFile together\n"); printUsageHelp(); exit(1); } /* use input given to the -i switch. This is a convenience for testing scripts */ if (inlineInput != NULL) { FILE * temp = NULL; if ((temp = fopen("tempInput.txt", "w")) == NULL) { printf("Could not open temporary file %stempInput.txt%s for writing \n", YELLOW, NORMAL); exit(1); } fputs(inlineInput, temp); fclose(temp); strcpy(source, "tempInput.txt"); } // mode determines how the enter key behaves. Also, in compile mode // instructions are compiled into the machines program but not automatically // executed enum TestMode mode = INTERPRET; enum TestCommand testCom; FILE * inputStream; if((inputStream = fopen (source, "r")) == NULL) { printf ("Cannot open file %s%s%s \n", YELLOW, source, NORMAL); printf ("Try %s%s -i %s \n", CYAN, argv[0], NORMAL); exit (1); } //else { inputStream = stdin; } if (progmode != FILTERMODE) { printf("Using source file %s%s%s as input stream \n", YELLOW, source, NORMAL); printUsefulCommands(); } FILE * asmFile; FILE * scriptFile; // machine, input stream, tape cells, tape cells size, and program listing newMachine(&m, inputStream, 100, 10); // now we can use the machine to parse etc if (asmFileName != NULL) { if ((asmFile = fopen(asmFileName, "r")) == NULL) { printf("Could not open script-assembler file %s%s%s \n", YELLOW, asmFileName, NORMAL); exit(1); } if (progmode != FILTERMODE) { printf("\n"); printf("Loading assembler file %s%s%s \n", YELLOW, asmFileName, NORMAL); } strcpy(m.program.source, "asm.pp"); loadAssembledProgram(&m.program, asmFile, 0); if (progmode != FILTERMODE) { printf("Compiled %s%d%s instructions " "from '%s%s%s' in about %s%ld%s milliseconds \n", CYAN, m.program.count, NORMAL, CYAN, m.program.source, NORMAL, CYAN, m.program.compileTime, NORMAL); } fclose(asmFile); } if (scriptFileName != NULL) { if ((scriptFile = fopen(scriptFileName, "r")) == NULL) { printf("Could not open script file %s%s%s \n", YELLOW, scriptFileName, NORMAL); exit(1); } if (progmode != FILTERMODE) { printf("\n"); printf("Loading script file %s%s%s \n", YELLOW, scriptFileName, NORMAL); } /* the procedure is: load asm.pp, run it on the script-file (assembly is saved to sav.pp) load sav.pp, run it on the input stream. */ int result = loadScript(&m.program, scriptFile, 0); fclose(scriptFile); /* if (printState == TRUE) { showMachineTapeProgram(&m, 3); } */ if (result > EXECQUIT) { fprintf(stderr, "The script file '%s' could not be compiled. \n", scriptFileName); printExitCode(result); exit(result); } } /* add commands given to the -e switch to the current program */ if (inlineScript != NULL) { /* the procedure is: save the -e script commands to a temporary file compile the commands to sav.pp with asm.pp load sav.pp, run it on the input stream. */ FILE * temp = NULL; if ((temp = fopen("temp.pp", "w")) == NULL) { printf("Could not open temporary file %stemp.pp%s for writing \n", YELLOW, NORMAL); exit(1); } fputs(inlineScript, temp); // this is a cludge because asm.pp doesnt deal with // the last character of input. fputs(" ", temp); fclose(temp); if ((temp = fopen("temp.pp", "r")) == NULL) { printf("Could not open temporary file %stemp.pp%s for reading \n", YELLOW, NORMAL); exit(1); } if ((asmFile = fopen("asm.pp", "r")) == NULL) { printf("Could not open assembler %sasm.pp%s \n", YELLOW, NORMAL); exit(1); } if (progmode != FILTERMODE) { printf("Loading assembler %sasm.pp%s \n", YELLOW, NORMAL); } // need a parameter to loadScript if we want to show machine // state after a compilation error int result = loadScript(&m.program, temp, m.program.count); fclose(temp); if (result > EXECQUIT) { fprintf(stderr, "The inline script in -e '%s' " "was not compiled. \n", inlineScript); printExitCode(result); exit(result); } } if (progmode == FILTERMODE) { //or runDebug(&m); run(&m); exit(0); } char line[401]; char command[201]; char argA[201]; char argB[201]; char args[300]; while (1) { printf(">"); fgets(line, 400, stdin); // remove newline line[strlen(line) - 1] = '\0'; //printf("input[%s]\n", line); command[0] = args[0] = argA[0] = argB[0] = '\0'; // int result; sscanf(line, "%200s %200[^\n]", command, args); // we also need to deal with quoted arguments such as // a "many words" // eg ranges [a-z] character classes :alpha: etc. /* A good example of sscanf with different cases res = sscanf(buf, " \"%5[^\"]\" \"%19[^\"]\" \"%29[^\"]\" %lf %d", p->number, p->name, p->description, &p->price, &p->qty); if (res < 5) { static const char *where[] = { "number", "name", "description", "price", "quantity"}; if (res < 0) res = 0; fprintf(stderr, "Error while reading %s in line %d.\n", where[res], nline); break; } */ // eg sscanf(line, "%200s '%200[^']'", command, argA) // printf("c=%s, argA=%s, argB=%s \n", command, argA, argB); testCom = textToTestCommand(command); //printf("testcom= %s \n", testInfo[testCom].description); if (testCom == HELP) { showTestHelp(); } else if (testCom == SEARCHCOMMAND) { if (strlen(args) == 0) { printf("No command seach term specified. \n"); printf("Try: // \n"); continue; } searchCommandHelp(args); } else if (testCom == SEARCHHELP) { if (strlen(args) == 0) { printf("No seach term specified. \n"); printf("Try: / \n"); continue; } searchHelp(args); } else if (testCom == COMMANDHELP) { if (strlen(args) == 0) { printf("No machine command specified. \n"); continue; } if (textToCommand(args) == UNDEFINED) { printf("Not a valid machine command %s%s%s \n", YELLOW, args, NORMAL); continue; } enum Command thisCom = textToCommand(args); printf("name: %s %s(%s%c%s)%s \n", info[thisCom].name, GREY, GREEN, info[thisCom].abbreviation, GREY, NORMAL); printf(" %s \n", info[thisCom].shortDesc); printf(" %s \n", info[thisCom].longDesc); printf("example: %s \n", info[thisCom].example); printf("takes %s%d%s argument(s) \n", YELLOW, info[thisCom].args, NORMAL); } else if (testCom == LISTMACHINECOMMANDS) { showCommandNames(); } else if (testCom == DESCRIBEMACHINECOMMANDS) { machineCommandHelp(); } else if (testCom == MACHINECOMMANDDOC) { machineCommandAsciiDoc(stdout); } else if (testCom == LISTCLASSES) { printClasses(); } else if (testCom == LISTCOLOURS) { showColours(); } else if (testCom == MACHINEPROGRAM) { showMachineTapeProgram(&m, tapeContext); } else if (testCom == MACHINESTATE) { showMachine(&m, TRUE); } else if (testCom == MACHINETAPESTATE) { showMachineWithTape(&m); } else if (testCom == MACHINEMETA) { printMachineMeta(&m); } else if (testCom == BUFFERSTATE) { showBufferInfo(&m.buffer); } else if (testCom == STACKSTATE) { showStackWorkPeep(&m, TRUE); } else if (testCom == TAPESTATE) { printSomeTape(&m.tape, TRUE); } else if (testCom == TAPEINFO) { printTapeInfo(&m.tape); } else if (testCom == TAPECONTEXTLESS) { if (tapeContext > 0) tapeContext--; printf("Tape display variable set to %d\n", tapeContext); } else if (testCom == TAPECONTEXTMORE) { tapeContext++; printf("Tape display variable set to %d\n", tapeContext); } else if (testCom == RESETINPUT) { rewind(inputStream); printf ("%s Rewound the input stream %s \n", YELLOW, NORMAL); } else if (testCom == RESETMACHINE) { colourPrint("reinitializing machine ...\n"); resetMachine(&m); } else if (testCom == STEPMODE) { mode = STEP; printf( "%sSTEP%s mode ( steps next instruction)\n", YELLOW, NORMAL); } else if (testCom == PROGRAMMODE) { mode = PROGRAM; printf( "%sPROGRAM%s mode ( displays program listing)\n", YELLOW, NORMAL); } else if (testCom == MACHINEMODE) { mode = MACHINE; printf( "%sMACHINE%s mode ( displays machine state)\n", YELLOW, NORMAL); } else if (testCom == COMPILEMODE) { printf ("Not implemented ... \n"); } else if (testCom == IPCOMPILEMODE) { mode = IPCOMPILE; printf ("Mode set to %sIPCOMPILE%s \n", YELLOW, NORMAL); printf ( " Instructions will be compiled into the program \n" " at the current instruction pointer position, instead \n" " of at the end of the program. \n"); } else if (testCom == ENDCOMPILEMODE) { mode = ENDCOMPILE; printf ("Mode set to %sENDCOMPILE%s \n", YELLOW, NORMAL); } else if (testCom == INTERPRETMODE) { mode = INTERPRET; printf("Mode set to %sINTERPRET%s \n", YELLOW, NORMAL); printf(" %smachine commands will be executed but not compiled \n" " to the internal program.%s \n", WHITE, NORMAL); } // ENTER key pressed .... else if (strcmp(command, "") == 0) { if (mode == MACHINE) { // to do: show the stream with ftell and fseek showMachine(&m, TRUE); } else if (mode == STEP) { step(&m); printSomeProgram(&m.program, 5); showMachine(&m, TRUE); } else if (mode == PROGRAM) { printSomeProgram(&m.program, 5); } } else if (strcmp(command, "BB") == 0) { showBufferAndPeep(&m); } else if (testCom == LISTPROGRAM) { printProgram(&m.program); } else if (testCom == LISTSOMEPROGRAM) { printSomeProgram(&m.program, 7); } else if (testCom == LISTPROGRAMWITHLABELS) { printProgramWithLabels(&m.program); } else if (testCom == PROGRAMMETA) { printProgramMeta(&m.program); } else if (testCom == SAVEPROGRAM) { FILE * savefile; if (strlen(args) == 0) { printf ("%s No save file given! %s \n", CYAN, NORMAL); printf ("%s Try %sP.wa %s \n", GREEN, YELLOW, NORMAL); continue; } if ((savefile = fopen (args, "w")) == NULL) { printf ("Could not open file %s%s%s for writing \n", YELLOW, args, NORMAL); continue; } saveAssembledProgram(&m.program, savefile); fclose(savefile); } else if (testCom == SHOWJUMPTABLE) { printJumpTable(m.program.labelTable); } else if (testCom == LOADPROGRAM) { FILE * loadfile; if (strlen(args) == 0) { printf ("%s No assembly text file given to load %s \n", GREEN, NORMAL); continue; } if ((loadfile = fopen (args, "r")) == NULL) { printf ("Could not open file %s%s%s for reading\n", YELLOW, args, NORMAL); continue; } loadAssembledProgram(&m.program, loadfile, 0); strcpy(m.program.source, args); fclose(loadfile); } else if (testCom == LOADASM) { FILE * loadfile; if ((loadfile = fopen ("asm.pp", "r")) == NULL) { printf ("Could not open file %sasm.pp%s for reading\n", YELLOW, NORMAL); continue; } printf("%sresetting machine and loading '%sasm.pp%s'... \n", BROWN, PINK, NORMAL); rewind(inputStream); resetMachine(&m); clearProgram(&m.program); strcpy(m.program.source, "asm.pp"); loadAssembledProgram(&m.program, loadfile, 0); fclose(loadfile); } else if (testCom == LOADLAST) { FILE * loadfile; if ((loadfile = fopen ("last.pp", "r")) == NULL) { printf ("Could not open file %slast.pp%s for reading\n", YELLOW, NORMAL); continue; } strcpy(m.program.source, "last.pp"); loadAssembledProgram(&m.program, loadfile, 0); fclose(loadfile); } else if (testCom == LOADSAVED) { FILE * loadfile; if ((loadfile = fopen ("sav.pp", "r")) == NULL) { printf ("Could not open file %ssav.pp%s for reading\n", YELLOW, NORMAL); continue; } strcpy(m.program.source, "sav.pp"); loadAssembledProgram(&m.program, loadfile, 0); fclose(loadfile); } else if (testCom == LISTSAVFILE) { FILE * loadfile; char line[200]; char key; int ii = 0; if ((loadfile = fopen("sav.pp", "r")) == NULL) { printf ("Could not open file %ssav.pp%s for reading\n", YELLOW, NORMAL); continue; } while (fgets(line, sizeof(line), loadfile)) { printf("%s", line); if ((ii+1) % 14 == 0) { pagePrompt(); key = getc(stdin); if (key == 'q') { break; } } } fclose(loadfile); } else if (testCom == SAVELAST) { FILE * savefile; if ((savefile = fopen ("last.pp", "w")) == NULL) { printf ("Could not open file %slast.pp%s for writing \n", YELLOW, NORMAL); continue; } saveAssembledProgram(&m.program, savefile); fclose(savefile); } else if (testCom == CHECKPROGRAM) { validateProgram(&m.program); } else if (testCom == CLEARPROGRAM) { clearProgram(&m.program); } else if (testCom == CLEARLAST) { // need to set last instruction to undefined newInstruction(&m.program.listing[m.program.count-1], UNDEFINED); m.program.count--; printSomeProgram(&m.program, 7); } else if (testCom == INSERTINSTRUCTION) { // insert an instruction at the current program ip // need to recalculate jump address (add 1) printf ("Inserting instruction at %s%d%s \n", YELLOW, m.program.ip, NORMAL); insertInstruction(&m.program); printProgram(&m.program); } else if (testCom == EXECUTEINSTRUCTION) { execute(&m, &m.program.listing[m.program.ip]); showMachineTapeProgram(&m, tapeContext); /* printSomeProgram(&m.program, 6); printf("%s--------- Machine State ----------%s \n", YELLOW, NORMAL); showMachine(&m); */ } else if (testCom == PARSEINSTRUCTION) { struct Instruction * ii; struct Label table[100]; // void jumptable printf("testing instructionFromText()...\n"); if (strlen(args) == 0) { printf ("%s No example instruction given... %s \n" " Try %spi: add {this}%s \n", GREEN, NORMAL, YELLOW, NORMAL); continue; } newInstruction(ii, UNDEFINED); instructionFromText(stdout, ii, args, -1, table); printEscapedInstruction(ii); printf("\n"); } else if (testCom == TESTWRITEINSTRUCTION) { printf("Testing writeInstruction()... \n"); writeInstruction(&m.program.listing[m.program.ip], stdout); } else if (testCom == STEPCODE) { printf("step through next compiled instruction:\n"); step(&m); printSomeProgram(&m.program, 6); } else if (testCom == RUNCODE) { printf("%sRunning program from instruction %s%d%s...%s\n", GREEN, CYAN, m.program.ip, GREEN, NORMAL); run(&m); printf("\n%s--------- Machine State ----------%s \n", YELLOW, NORMAL); showMachine(&m, TRUE); } else if (testCom == RUNZERO) { printf("%sRunning program from ip 0%s\n", GREEN, NORMAL); m.program.ip = 0; run(&m); } else if (testCom == RUNCHARSLESSTHAN) { int maximum; if (!sscanf(args, "%d", &maximum) || (strlen(args) == 0)) { printf ("%sNo number argument given%s \n", GREEN, NORMAL); printf ("%sUsage: rrc %s%s \n", GREEN, YELLOW, NORMAL); continue; } printf("%sRunning while characters less than %d%s\n", GREEN, maximum, NORMAL); runWhileCharsLessThan(&m, maximum); showMachineTapeProgram(&m, 2); } else if (testCom == RUNTOLINE) { int maximum; if (!sscanf(args, "%d", &maximum) || (strlen(args) == 0)) { printf ("%sNo line number argument given%s \n", GREEN, NORMAL); printf ("%sUsage: rrc %s%s \n", GREEN, YELLOW, NORMAL); continue; } printf("%sRunning until input line number is %d%s\n", GREEN, maximum, NORMAL); runToLine(&m, maximum); showMachineTapeProgram(&m, 2); } else if (testCom == RUNTOTRUE) { printf("run program until the flag is set to true\n"); runUntilTrue(&m); showMachineTapeProgram(&m, 2); } else if (testCom == RUNTOWORK) { if (strlen(args) == 0) { printf ("%sNo text given to compare%s \n", GREEN, NORMAL); printf ("%sUsage: rrw %s%s \n", GREEN, YELLOW, NORMAL); printf (" runs the current program until workspace is \n"); continue; } printf("Running program until workspace is \"%s%s%s\" \n", YELLOW, args, NORMAL); runUntilWorkspaceIs(&m, args); // printf("%s--------- Machine State ----------%s \n", // YELLOW, NORMAL); showMachineWithTape(&m); } else if (testCom == RUNTOENDSWITH) { if (strlen(args) == 0) { printf ("%sNo text given to compare%s \n", GREEN, NORMAL); printf ("%sUsage: rrw %s%s \n", GREEN, YELLOW, NORMAL); printf (" runs the program until workspace ends with \n"); continue; } printf("Running program until workspace ends with \"%s%s%s\" \n", YELLOW, args, NORMAL); runUntilWorkspaceEndsWith(&m, args); showMachineTapeProgram(&m, tapeContext); } else if (testCom == IPZERO) { m.program.ip = 0; printSomeProgram(&m.program, 4); } else if (testCom == IPEND) { m.program.ip = m.program.count-1; printSomeProgram(&m.program, 4); } else if (testCom == IPGO) { if (strlen(args) == 0) { printf ("%s No number given to jump to %s \n", GREEN, NORMAL); continue; } int ipnumber = 0; int res = sscanf(args, " %d", &ipnumber); if (res < 1) { printf ("%s Couldnt parse number %s \n", GREEN, NORMAL); continue; } m.program.ip = ipnumber; printSomeProgram(&m.program, 4); } else if (testCom == IPPLUS) { m.program.ip++; printSomeProgram(&m.program, 4); } else if (testCom == IPMINUS) { m.program.ip--; printSomeProgram(&m.program, 4); } else if (testCom == SHOWSTREAM) { // show some upcoming chars from stream and then // reset stream position long pos = ftell(m.inputstream); int num = 40; char c; int ii = 0; printf ("%sNext %s%d%s chars in input stream...%s\n", GREEN, YELLOW, num, GREEN, BLUE); while (((c = fgetc(m.inputstream)) != EOF) && (ii < num)) { printf("%c", c); ii++; } printf("%s\n", NORMAL); fseek(m.inputstream, pos, SEEK_SET); } // deal with pure machine commands (not testing commands) else if (textToCommand(command) != UNDEFINED) { void * p = NULL; // no jump table here if (mode == IPCOMPILE) { compile(&m.program, line, m.program.ip, p); } else if (mode == INTERPRET) { // execute the entered command but dont insert it // in the program. struct Instruction ii; instructionFromText(stdout, &ii, line, 0, p); execute(&m, &ii); // execute automatically advances ip pointer, which we // dont want in this case. Jumps will probably not be entered // interactively so, should not cause trouble here. m.program.ip--; showMachineTapeProgram(&m, tapeContext); continue; } else { compile(&m.program, line, m.program.count, p); } // check if the instruction is ok // checkInstruction(&m.program.listing[m.program.count-1], stdout); if (mode == COMPILE) { showMachineTapeProgram(&m, tapeContext); } else step(&m); m.program.ip = m.program.count; showMachineTapeProgram(&m, tapeContext); } else if (testCom == EXIT) { printf("%sSaving program to '%slast.pp%s'...\n", GREEN, YELLOW, GREEN); FILE * savefile; if ((savefile = fopen ("last.pp", "w")) == NULL) { printf ("Could not open file %slast.pp%s for writing \n", YELLOW, NORMAL); continue; } saveAssembledProgram(&m.program, savefile); fclose(savefile); fclose(inputStream); colourPrint("Freeing Machine memory...\n"); freeMachine(&m); colourPrint("Goodbye !!\n"); exit(1); } else { printf("%sUnrecognised command:%s %s %s \n", BROWN, command, GREEN, NORMAL); } } fclose(inputStream); return(0); } // /* */