/* http://bumble.sourceforge.net/books/gh/gh.c OVERVIEW This file is an attempt to create a virtual machine which is particularly apt for parsing context free languages. The machine consists of a (text) stack, (text) workspace, a "tape" (array) structure, 1 "look-ahead" character (called the "peep"), an integer accumulator (for simple counting), and 1 flag (true or false). In general all instructions operate on the "workspace" buffer. The essential idea is a stack/tape combination which can handle the nested structures of context-free languages. It is such a simple idea, that if it does work, its hard to think why it hasnt been thought of and implemented before. The general aim of the machine is to create unix-style stream filters which can parse and transform context-free language patterns. This file also includes a testing "interpreter" with help information and interactive commands for inspecting the machine and program state. PARSING WITH THE MACHINE The general procedure for parsing is to 'put' the attribute into the tape, the clear the workspace, create a token and push that token onto the stack. Then a series of tokens are popped off the stack, compared for sequences, and 'reduced' as in a bnf (backus-naur form) grammar. At the same time the values or attributes of the tokens are got from the tape and transformed as required, and then put back into the tape or printed to stdout. PARSING CHALLENGES simple arithmetic expression parser/compiler eg: n = 4*y+(6-x)/2 actually associativity (or operator precedence) is difficult if not impossible for the parse machine. parse palindromes. toy forth parser compiler toy lisp parser/compiler token*token* the star comes at the end of the token. ideas: a machineInfo array to explain each part of the machine... stack, peep, tape, accumulator etc The syntax for the argument indicates what type of argument it is. Eg: "abc" is text; [a-z] in brackets, is a range; [abcd] is a list; [:space:] is a character class. This is incorporated into the compile() function. COMPILING AND RUNNING THIS CODE tools: bash shell (or any other shell for compiling etc) linux (makes things easier), sed, date, vim, asciidoctor, enscript - for formatting and printing source code * create a printable pdf of the code in 2 columns landscape format >> enscript -o - -2r -f Courier7 gh.c | ps2pdf - os.pdf All the code is in one file, so compilation should be straight-forward I call the executable "pp" for "pattern parser". The file is called gh.c because I already had a folder called "pp" * compile code >> gcc gh.c -o pp * a bash alias to run the code >> alias pp='cd ~/sf/htdocs/books/gh/; ./pp' * a bash function to compile the source code gh.c and add a date stamp --------------------- ppc() { echo "Compiling 'gh.c' as executable 'pp'" echo "Datestamp: $(date +%d%b%Y-%I%P)" # The line below adds the compile time and date to the version cd ~/sf/htdocs/books/gh cp gh.c gh.pre.c sed -i "/v31415\"; *$/s/v31415/version.$(date +%d%b%Y-%I%P)/" gh.pre.c gcc gh.pre.c -o pp } ,,, COMPILATION OF SCRIPTS In a somewhat reflexive manner the machine will compile itself. Or more explicitly the compilation of a script happens as follows. The machine loads an assembly-code file called "asm.pp". It then uses that script to compile a given script into the equivalent assembly-code (and saves the code in a text file called something like "script.pp" Then it loads that assembly-code file and uses it to process the input. In order to achieve this I removed line numbers from assembly saving and loading, and made jumps relative and allowed labels in assembly scripts. This will make hand-coding of assembly code feasible TO DO * write function: commandHelpHtml(...) which will display a summary of commands and help in the 'markdown' or asciidoctor format, which can then be formatted into html5, docbook, latex, or pdf. * it will be necesssary to have a way of looping back to the start of the syntactical analysis section (shift-reduce operation on the stack corresponding to backus-naur grammar rules) This is so that all shift-reduction are applied. This in turn requires a flag reduce? which is set to true when one reduction has occurred, thereby triggering the loop. * improve checkInstruction() function which tests for appropriate datatypes of parameters (eg jumps must have integer jump targets, add cannot have class/list/range datatype parameters...) - write a display tape/stack function which shows stack elements paired with tape elements. - include a negation operator (!) which inverts the following test eg: ! "tree" { ... } or "tree"!{d;} do something if the workspace is not "tree". This can be implemented, I think, by using a jumpTrue instead of a jumpFalse instruction immediately after the test. - implement swap command - implement 'substitute' command. (do we need this?) - set the input stream to some text which the user enters. - implement "r/ " load a program by searching by name - make "r/" list all programs BUGS * segmentation fault when growing the program. * segmentation fault when reload "asm.pp" * calling "run()" on the -f switch crashes the program HISTORY I have been working on this off and on for a number of years (since about 2003) with many stops and starts in the meantime as well as many deadends. In fact I dont even know if the idea in itself is sound. Nevertheless I have a hunch that this approach to parsing context-free languages may be very interesting. We shall see. The code is quite close to being useful (2018) The idea of a parsing machine derived from thinking about the sed stream editor and its limitations. At the very least this virtual machine should be able to handle nested structures which "sed" is unable to handle. So lisp-like syntax (+ (* n n) m) should be easily parsable. The coding of this version was begun around mid-2014. A number of other versions have been written in the past but none was successful or complete. 19 july 2019 Bug! when program grows during loading a segmentation fault occurs. created test.commands.pss which contains simple commands which can be parsed and compiled by the asm.pp script. Also, realised that the compilation from assembler should stop with errors when an undefined instruction is found. Dealt with a great many warnings that arise when one uses "gcc -Wall" implemented command 'cc' adds the input stream character count to the workspace buffer Also made an automatic newline counter, which is incremented every time a \n character is encountered. And the 'll' command which appends the newline counter as a string onto the end of the workspace buffer. Since the main function of this parse-machine is to compile "languages" from a text source, the commands above are very useful because they allow the compilation script to give error messages when the source document is not in the correct format (with line number and possibly character count). Did some work on "asm.pp" which is the assembler file which compiles scripts. Sounds very circular but it works. Realised that after applying bnf rules, need to jump back to the "parse:" label in case other previous rules apply. 18 july 2019 Discovered a bug when running "asm.pp" in unix filter mode "Abort trap: 6" which means writing to some memory location that I should not be. Strangely, when I run the same script interactively (with "rr") it works and doesnt cause the abort. Created a "write" command, on the machine, which writes the current workspace to a file called "sav.pp". This has a parallel in sed (which also has a 'w' write command). This command should be useful when compiling scripts and then running them (since they are compiled to an intermediate "assembler" phase, and then loaded into the machine). made some progress to use the pattern-machine as a unix-style filter program. Added some command line options with getopt(). The parser should be usable (in the future) like sed: eg cat somefile | pp -sf script.pp > result.txt or cat somefile | pp -sa script.ppa > result.txt where script.ppa is an "assembler" listing which can be loaded into the machine. 16 july 2019 Working on parsing with asm.pp. Seem to have basic commands parsing and compiling eg: add "this"; pop; push; etc Simple blocks are parsing and compiling. There are still some complications concerning the order of shift-reductions. Made execute() have a return value eg: 0: success no problems 1: end of stream reached 2: undefined instruction 3: quit/crash executed (exit script) 4: write command could not open file sav.pp for writing more work. some aesthetic fixes to make it easier to see what the machine is doing. wrote showMachineTapeProgram() to give a nice view of pretty much everything that is going on in the machine at once. Working on how to collate "attributes" in the tape array register. Made an optional parameter to printSomeTape() that escapes \n \r etc in the tape cells which makes the output less messy. 15 july 2019 A lot of progress. Starting to work on asm.pp again. Have basic shift-reduction of stack tokens working. Now to get the tape "compiling" attributes as well. The bug seems to be: that JUMP is not treated as a relative jump by execute() but is being compiled as a relative jump by instructionFromText(). So, either make, JUMPs relative or ... Made the "labelTable" (or jumpTable) a property of the program. This is a good idea. Also made the command 'jj' print out the label table. Still using "jumptable" phrase but this is not a good name for this. I should organise this file: first structure definitions. then prototype declarations, and then functions. I havent done this because it was convenient to write the function immediately after the structure def (so I could look at the properties). But if I rearrange, then it will be easier to put everything in a header file, if that is a good idea. Lots of minor modifications. made searchHelp also search the help command name, for example. Added a compileTime (milliseconds) property to the Program structure, and a compileDate (time_t). 81 instructions (which is how many instructions in asm.pp at the moment) are taking 4 milliseconds to compile. which seems pretty slow really. sizes: gh.c 138430 bytes pp 80880 bytes trying to eliminate warnings from the gcc compiler, which are actually very handy. Also seem to have uncovered a bug where the "getJump" function was actually after where it was used (and this gh.c does not use any header files, which is very primitive). So the label jumptable code should not have been working at all... changing lots of %d to %ld for long integers. Also, on BSD unix the ansi colour escape code for "grey" appears to be black. 13 july 2019 Looking at this on an OSX macbook. The code compiles (with a number of warnings) and seems to run. The colours in this bash environment are different. 12 Dec 2018 After stepping through the "asm" program I discovered that unconditional jump targets are not being correctly encoded. This probably explains why the script was not running properly. Also I may put labels into the deassembled listings so that the listings are more readable. 19 sept 2018 revisiting. Need to create command line switches: eg -a for loading an assembler script. and -f to load a script file. Need to step through the asm.pp script and work out why stack reduction is not working... (see above for the answer). An infinite loop is occurring. Also, need to write the treemap app for iphone android, not related to this. Also, need to write a script that converts this file and book files to an asciidoctor format for publishing in html and pdf. Then send all this to someone more knowledgeable. 5 sept 2018 gh.c 133423 bytes pp 78448 bytes Would be handy to have a "run until 10 more chars read" function. This would help to debug problematic scripts. Segmentation fault probably caused by trying to "put" to non-existant tape cell (past the end). Need to check tape size before putting, and grow the tape if necessary. could try to make a palindrome parser. Getting a segmentation fault when running the asm.pp program right through. Wrote an INTERPRET mode for testing- where commands are executed on the machine but not compiled into the current program. Wrote runUntilWorkspaceIs() and adding a testing command to invoke this. This should make is easier to test particular parts of a script. found and fixed a problem with how labels are resolved, this was cause by buildJumpTable() not ignoring multiline comments. 4 sept 2018 Made multiline comments (#* ... *#) work in assembler scripts. Made the machine.delimiter character visible and used by push and pop in execute(). There is no way to set the delimiter char or the escape char in scripts 3 sept 2018 Added multiline comments to asm.pp (eg #* ... *#) as well as single line comments with #. Idea: make gh.c produce internal docs in asciidoctor format so we can publish to html5/docbook/pdf etc. working on the asm.pp script. Made "asm" command reset the machine and program and input stream. Added quoted text and comments to the asm.pp script parsing, but no stack parsing yet. Need to add multiline comments to the loadAssembledProgram() function. while and whilenot cannot use a single char: eg: whilenot "\n" doesnt work. So, write 'whilenot [\n]' instead Also should write checkInstruction() called by instructionFromText() to make sure that the instruction has the correct parameter types. Eg: add should have parameter type text delimited by quotes. Not a list [...] or a range [a-z] If the jumptable is a global variable then we can test jump calculations interactively. Although its not really necessary. Would be good to time how long the machine takes to load assembler files, and also how long it takes to parse and transform files. 2 sept 2018 wrote getJump() and made instructionFromText() lookup the label jump table and calculate the relative jump. It appears to be working. Which removes perhaps the last obstacle to actually writing the script parser. Need to make program listings "page" so I can see long listings. 1 Sept 2018 writing printJumpTable() and trying to progress. Looking at Need to add "struct label table[]" jumptable parameter to instructionFromText(), and compile(). asciidoctor. 31 aug 2018 Continued to work on buildJumpTable. Will write printJumpTable. Renamed the script assembler to "asm.pp" Made a bash function to insert a timestamp. Created an "asm" command in the test loop to load the asm.pp file into the program. Started a buildTable function for a label jump table. These label offsets could be applied by the "compile" function. 30 August 2018 "gh.c" source file is 117352 bytes. Compiled code is 72800 bytes. I could reduce this dramatically by separating the test loop from the machine code. Revisiting this after taking a long detour via a forth bytecode machine which currently boots on x86 in real mode (see http://bumble.sourceforge.net/books/osdev/os.asm ) and then trying to port it to the atmega328p architecture (ie arduino) at http://bumble.sf.net/books/arduino/os.avr.asm The immediate task seems to be to write code to create a label table for assembly listings, and then use that code to replace labels with relative jump offsets. After that, we can start to write the actual code (in asm.pp) which will parse and compile scripts. So the process is: the machine loads the script parser code (in "asm" format) from a text file. The machine uses that program to parse a given script and convert to text "asm" format. The machine then loads the new text asm script and uses it to parse and transform ("compile") an input text stream. 20 decembre 2017 Allowed assembly listings with no line numbers as default. It would be good idea to allow labels in assembly listings, eg 'here:' to make it easier to hand code assembly. So, need a label table. Look at the info arrays for the syntax... Made conditional jumps relative so that they would be easier to "hand-code" as integers (although labels are really needed). Also, need to add a loadAsm() function which is shorthand to load the script assembler. 17 decembre 2017 For some reason, the code was left in a non compilable state in 2016. I think the compile() and instructionFromText() functions could be rewritten but seem to be working at the moment. 13 dec 2017 The code is not compiling because the parameter to the "compile()" function is wrong. When we display instructions, it would be good to always indicate the data type of the parameter (eg: text, int, range etc) Modify "test" to use different parameter types, eg list, range, class. 29 september 2016 used instructionFromText() within the compile() function and changed compile to accept raw instruction text (not command + arguments) wrote scanParameter which is a usefull little function to grab an argument up to a delimiter. It works out the delimiter by looking at the first char of the argument and unescapes all special chars. Now need to change loadAssembled to use compile(). 28 sept 2016 Added a help-search / and a command help search //. Added escapeText() and escapeSpecial(), and printEscapedInstruction(). add writeInstruction() which escapes and writes an instruction to file. Added instructionFromText() and a test command which tests that function. Worked on loadAssembledProgram() to properly load formats such as "while [a-z]" and "while [abc\] \\ \r \t]" etc. All this work is moving towards having the same parse routine loading assembled scripts from text files as well as interactively in the test loop. 26 sept 2016 Discovered that swap is not implemented. 22 sept 2016 Added loadlast, savelast, runzero etc. a few convenience functions in the interpreter. One hurdle: I need to be able to write testis "\n" etc where \n indicates a newline so that we can test for non printing characters. So this needs to go into the machine as its ascii code. Also, when showing program listings, these special characters \n \t \r should be shown in a different colour to make it obvious that they are special chars... Also: loadprogram is broken at the moment.... need to deal with datatypes. 21 Sept 2016 When in interpreter mode, reading the last character should not exit, it should return to the command prompt for testing purposes. 15 August 2016 Wrote an "int read" function which reads one character from stdin and simplifies the code greatly. Still need to fix "escaping". need to make ss give better output, configurable Escaping in 'until' command seems to be working. 13 August 2016 Added a couple more interpreter commands to allow the manipulation of the program and change the ip pointer. Now it is possible to jump the ip pointer to a particular instruction. Also, looked at the loadAssembledProgram and saveAssembledProgram functions to try to rewrite them correctly. The loadAssembledProgram needs to be completely cleaned up and the logic revised. My current idea is to write a program which transforms a pp script into a text assembly format, and then use the 'loadAssembledProgram' to load that script into the machine. Wrote 'runUntilTrue' function which executes program instructions until the machine flag is set to true (by one of the test instructions, such as testis testbegins, testends... This should be useful for debugging complex machine programs. 7 Jan 2016 wrote a cursory freeMachine function with supporting functions 4 Jan 2016 tidying up the help system. had the idea of a program browser, ie browse 'prog' subfolder and load selected program into the machine. Need to write the actual script compilation code. 3 Jan 2016 Writing a compile function which compiles one instruction given command and args. changed the cells array in Tape to dynamic. Since we can use array subscripts with pointers the code hardly changes. Added the testclass test Made program.listing and tape.cells pointers with dynamic memory allocation. 1 Jan 2016 working on compiling function pointers for the character class tests with the while and testis instructions. Creating reflection arrays for class and testing. late Dec 2015 Continued work. Trying to resolve all "malloc" and "realloc" problems. Using a program with instruction listing within the machine. Each command executed interactively gets added to this. 26 Dec 2015 Saving and loading programs as assembler listings. validate program function. "until" & "pop" more or less working. "testends" working ... 19 Dec 2015 Lots of small changes. The code has been polished up to an almost useable stage. The machine can be used interactively. Several instructions still need to be implemented. Push and pop need to be written properly. Need to realloc() strings when required. The info structure will include "number of parameter fields" so that the code knows how many parameters a given instruction needs. This is useful for error checking when compiling. 16 Dec 2015 Revisiting this after a break. Got rid of function pointers, and individual instruction functions. Just have one executing function "execute()" with a big switch statement. Same with the test (interpreter) loop. A big switch statement to process user commands. Start with the 'read' command. Small test file. The disadvantage of not having individual instruction functions (eg void pop(struct Machine * mm) void push(struct Machine * mm) etc) is that we cannot implement the script compiler as a series of function calls. However the "read" instruction does have a dedicated function. 23 Feb 2015 The development strategy has been to incrementally add small bits to the machine and concurrently add test commands to the interpreter. 22 Feb 2015 Had the idea to create a separate test loop file (a command interpreter) with a separate help info array. show create showTapeHtml to print the tape in html. These functions will allow the code to document itself, more or less. Changes to make: The conditional jumps should be relative, not absolute. This will make it easier to hand write the compiler in "assembly language". Line numbers are not necessary in the assembly listings. The unconditional jump eg jump 0 can still be an absolute line number. Put a field width in the help output. Change help output colours. Make "pp" help command "ls" make p.x -> px or just "." 2006 - 2014 Attempted to write various versions of this machine, in java, perl, c++ etc, but none was completed successfully see http://bumble.sf.net/pp/cpp for an incomplete implementation in c++. But better to look at the current version, which is much much better. 2005 approximately I started to think about this parsing machine while living in Almetlla de Mar. My initial ideas were prompted by trying to write parsing scripts in sed and then reading snippets of Compilerbau by N. Wirth, thinking about compilers and grammars TERMINOLOGY * tape pointer: * stack: a text buffer that can be manipulated like a stack. * command: one of the permitable instructions for the VM (eg push pop etc) * instruction: a command & parameter (maybe) compiled into a 'program' (array) which can be executed by the Virtual Machine * */ // #include #include #include #include #include #include // not actually used for anything. #include "gh.h" /* ----------------------------------------- stuff that could go in a header file. But I cant be bother moving all my structures into a different file. function prototypes. But some prototypes must go under the relevant structure. */ void escapeSpecial(char *, char *); // --------------------------- // information about colours, which may be useful since I use // them alot to display the machine. enum Bool {TRUE=0, FALSE}; enum Colour { GREYc=0, REDc, PALEREDc, GREENc, PALEGREENc, BROWNc, YELLOWc, PALEBLUEc, BLUEc, PURPLEc, PINKc, AQUAc, CYANc, WHITEc, BLACKc, NORMALc }; struct { enum Colour colour; char * name; char * ansi; } colourInfo[] = { { GREYc, "grey", "\x1B[1;30m"}, { REDc, "red", "\x1B[0;31m"}, { PALEREDc, "palered", "\x1B[1;31m"}, { GREENc, "green", "\x1B[0;32m"}, { PALEGREENc, "palegreen", "\x1B[1;32m"}, { BROWNc, "brown", "\x1B[0;33m"}, { YELLOWc, "yellow", "\x1B[1;33m"}, { PALEBLUEc, "paleblue", "\x1B[0;34m"}, { BLUEc, "blue", "\x1B[1;34m"}, { PURPLEc, "purple", "\x1B[0;35m"}, { PINKc, "pink", "\x1B[1;35m"}, { AQUAc, "aqua", "\x1B[0;36m"}, { CYANc, "cyan", "\x1B[1;36m"}, { WHITEc, "white", "\x1B[37m"}, { BLACKc, "black", "\x1B[0;30m"}, { NORMALc, "normal", "\x1B[0m"} }; #define BLACK "\x1B[0;30m" #define PALERED "\x1B[0;31m" #define RED "\x1B[1;31m" #define PALEGREEN "\x1B[0;32m" #define GREEN "\x1B[1;32m" #define BROWN "\x1B[0;33m" #define YELLOW "\x1B[1;33m" #define PALEBLUE "\x1B[0;34m" #define BLUE "\x1B[1;34m" #define PALEMAGENTA "\x1B[0;35m" #define PURPLE "\x1B[0;35m" #define PINK "\x1B[1;35m" #define PALECYAN "\x1B[0;36m" #define AQUA "\x1B[0;36m" #define CYAN "\x1B[1;36m" #define WHITE "\x1B[37m" // on BSD (Mac OSX, this grey seems to be black) // define GREY "\x1B[1;30m" #define GREY "\x1B[37m" #define NORMAL "\x1B[0m" //strcpy(GREY, colourInfo[GREYc].name); //char * = colourInfo[c].ansi; //char * Colours[] = // {RED, GREEN, YELLOW, BLUE, PURPLE, CYAN, WHITE, NORMAL}; /* show a table of colours and their names and numbers */ void showColours() { int ii; printf("Colours and colour values: \n"); for (ii = 0; ii <= NORMALc; ii++) { //if (ii % 5 == 0) printf("\n"); printf("%s%2d: %s%9s [%s%s%s]\n", GREEN, ii, WHITE, colourInfo[ii].name, colourInfo[ii].ansi, colourInfo[ii].name, WHITE); } printf("\n"); } /* just displays some text at the bottom of the screen when the output needs to be 'paged' (one screen at a time) */ void pagePrompt() { printf("\n%s%s for more (%sq%s = exit):", AQUA, NORMAL, AQUA, NORMAL); } // print text in lots of colours! .... void colourPrint(char * text) { int ii; int length = strlen(text); for (ii = 0; ii < length; ii++) { printf("%s%c", colourInfo[ii%13 + 1].ansi, *text); text++; } printf(NORMAL); } // Print the text as a row in various colours void colourRow(char * text) { int ii; int length = strlen(text); for (ii = 0; ii < (40/length); ii++) { printf("%s%s", colourInfo[ii%13 + 1].ansi, text); } printf("\n"); printf(NORMAL); } // print a banner void banner() { printf("\n"); colourRow("== "); printf("%s pp:%s A Pattern Parsing Machine%s\n", PINK, GREEN, NORMAL); colourRow("== "); printf("\n"); } // One cell in the 'tape' or array of strings. The cells of the // tape have a 1-to-1 correspondance to the cells of the stack // show capacity be characters or bytes??? for new 'bytes' because // its easier to program #define TAPECELLSIZE 50 struct TapeCell { int resizings; // how many mallocs, for performance issues size_t capacity; // how many characters in this cell? char * text; // the text content of the cell }; // functions relating to TapeCells // the sizeof(char) below is unnecessary but will be important // if this is rewritten for wchar_t /* initialise and allocate text memory for a tape cell this function could also allocate memory for the TapeCell itself, but should it? eg cell = malloc(sizeof(struct TapeCell)) Well... if the Tape is not dynamically resized, then it is not necessary. eg: struct Tape { int capacity; struct TapeCell cells[1000]; } will allocate the required memory for the tapecells */ void newCell(struct TapeCell * cell, int cellSize) { cell->text = malloc(cellSize * sizeof(char)); if(cell->text == NULL) { fprintf(stderr, "couldnt allocate memory for cell->text in newCell()\n"); exit(EXIT_FAILURE); } cell->capacity = cellSize; cell->resizings = 0; } // not really necessary this func /* void freeCell(struct TapeCell * cell) { free(cell); } */ // make the tape cell bigger, preserving text (realloc not malloc) // bug! this function should have a 'minimum' parameter because // cells contents are set, not added to. // another bug: the capacity should not be multiplied by // sizeof char or wchar because cell-capacity should indicate // how many chars it can hold, not bytes. (but for realloc its ok) #define TAPECELLGROWFACTOR 50 void growCell(struct TapeCell * cell) { long newCapacity = cell->capacity + TAPECELLGROWFACTOR*sizeof(char); cell->text = realloc(cell->text, newCapacity); if(cell->text == NULL) { fprintf(stderr, "couldnt allocate more memory for cell->text in growCell()\n"); exit(EXIT_FAILURE); } cell->capacity = newCapacity; cell->resizings++; } void printTapeCellInfo(struct TapeCell * tc) { printf("%s, %ld ", tc->text, tc->capacity); } void printTapeCellLongInfo(struct TapeCell * tc) { printf("%s ( capacity=%ld, resizings=%d )", tc->text, tc->capacity, tc->resizings); } // the tape: should this be a separate structure or // just an array. For a dynamically resizable tape a separate // structure seems useful, but adds complexity. // tape capacity refers to number of tape cells not bytes #define TAPESTARTSIZE 1000 struct Tape { int resizings; //how many times re-malloced this tape? long capacity; //how many cells in the tape struct TapeCell * cells; // dynamic allocation //struct TapeCell cells[TAPESTARTSIZE]; //non dynamic allocation int currentCell; }; // functions relating to tapes // initialise a tape and initialise each tapecell // but we dont really need a pointer to a tape to be // returned from this function, since there will only be // one tape per machine. void newTape(struct Tape * tape, int cells, int cellSize) { tape->capacity = cells; tape->resizings = 0; tape->cells = malloc(tape->capacity * sizeof(struct TapeCell)); if(tape->cells == NULL) { fprintf(stderr, "couldnt allocate memory for cells in tape in newTape()\n"); exit(EXIT_FAILURE); } int ii; for (ii = 0; ii < tape->capacity; ii++) { newCell(&tape->cells[ii], cellSize); } tape->currentCell = 0; } void freeTape(struct Tape * tape) { int ii; for (ii = 0; ii < tape->capacity; ii++) { free(tape->cells[ii].text); } free(tape->cells); } /* // if the tape can get more cells, this has to be implemented struct Tape * growTape() { } */ // set all cells in the tape to empty void clearTape(struct Tape * tape) { int ii; tape->currentCell = 0; for (ii = 0; ii < tape->capacity; ii++) { tape->cells[ii].text[0] = '\0'; } } // show the contents of the tape, cells, with > showing current cell void printTape(struct Tape * tape) { int ii; for (ii = 0; ii < tape->capacity; ii++) { if (ii == tape->currentCell) printf("%4d> %s\n", ii, tape->cells[ii].text); else printf("%4d: %s\n", ii, tape->cells[ii].text); } } // show the contents of the tape around current cell // add a "context" variable, to see more or less of the tape void printSomeTape(struct Tape * tape, enum Bool escape) { int ii; int start = tape->currentCell - 3; int end = tape->currentCell + 3; if (start < 0) start = 0; if (end > tape->capacity) end = tape->capacity; printf("Tape Size: %ld \n", tape->capacity); for (ii = start; ii < end; ii++) { if (escape == TRUE) { if (ii == tape->currentCell) { printf(" %s%4d> [", YELLOW, ii); escapeSpecial(tape->cells[ii].text, CYAN); printf("%s]%s\n", YELLOW, NORMAL); } else { printf(" %s%4d%s [", BLUE, ii, NORMAL); escapeSpecial(tape->cells[ii].text, CYAN); printf("%s]\n", NORMAL); } } else { if (ii == tape->currentCell) { printf(" %s%4d> [%s]%s\n", YELLOW, ii, tape->cells[ii].text, NORMAL); } else { printf(" %s%4d%s [%s]%s\n", BLUE, ii, NORMAL, tape->cells[ii].text, NORMAL); } } } } // same as above, but shows cell capacity. void printSomeTapeInfo(struct Tape * tape) { int ii; int start = tape->currentCell - 7; int end = tape->currentCell + 7; if (start < 0) start = 0; if (end > tape->capacity) end = tape->capacity; printf("Tape Size: %ld \n", tape->capacity); for (ii = start; ii < end; ii++) { if (ii == tape->currentCell) printf(" %s%s%3ld%s) %s%4d> [%s]%s\n", GREY, BROWN, tape->cells[ii].capacity, GREY, YELLOW, ii, tape->cells[ii].text, NORMAL); else printf(" %s%s%3ld%s) %s%4d%s [%s]\n", GREY, BROWN, tape->cells[ii].capacity, GREY, BLUE, ii, NORMAL, tape->cells[ii].text); } } // show a detailed display of the tape void printTapeInfo(struct Tape * tape) { int ii; char key; printf("Tape Size: %ld \n", tape->capacity); for (ii = 0; ii < tape->capacity; ii++) { if (ii == tape->currentCell) printf(" %s%s%3ld%s) %s%4d> [%s]%s\n", GREY, BROWN, tape->cells[ii].capacity, GREY, YELLOW, ii, tape->cells[ii].text, NORMAL); else printf(" %s%s%3ld%s) %s%4d%s [%s]\n", GREY, BROWN, tape->cells[ii].capacity, GREY, BLUE, ii, NORMAL, tape->cells[ii].text); if ( (ii+1) % 14 == 0 ) { pagePrompt(); key = getc(stdin); if (key == 'q') { return; } } } } // The buffer holds the stack and the workspace // pushing and popping the stack just involves shifting the // workspace pointer left or right. struct Buffer { int resizings; size_t capacity; char * workspace; char * stack; }; #define BUFFERSTARTSIZE 1000 void newBuffer(struct Buffer * buffer, int size) { buffer->stack = malloc(size * sizeof(char)); if(buffer->stack == NULL) { fprintf(stderr, "couldnt allocate memory for stack/workspace in newBuffer()\n"); exit(EXIT_FAILURE); } buffer->workspace = buffer->stack; buffer->stack[0] = '\0'; buffer->resizings = 0; buffer->capacity = size - 1; // one less for \0 } void clearBuffer(struct Buffer * buffer) { buffer->workspace = buffer->stack; buffer->stack[0] = '\0'; buffer->resizings = 0; } // make the workspace/stack buffer bigger in the machine void growBuffer(struct Buffer * buffer, int increase) { int offset = buffer->workspace - buffer->stack; long newCapacity = buffer->capacity + increase + 1*sizeof(char); buffer->stack = realloc(buffer->stack, newCapacity); if(buffer->stack == NULL) { fprintf(stderr, "couldnt allocate more memory for buffer in growBuffer()\n"); exit(EXIT_FAILURE); } buffer->workspace = buffer->stack + offset; buffer->resizings++; buffer->capacity = newCapacity - 1; } // show the state of the buffer for debugging void showBufferInfo(struct Buffer * buffer) { printf("%scapacity: %s%ld%s \n", GREEN, PURPLE, buffer->capacity, NORMAL ); printf("%sresizings: %s%d%s \n", GREEN, PURPLE, buffer->resizings, NORMAL ); //%.*s // not sure how to do this //printf("%sstack: %s%.*s%s \n", // GREEN, PURPLE, buffer.workspace - buffer.stack, NORMAL ); printf("%sworkspace: %s%s%s \n", GREEN, PURPLE, buffer->workspace, NORMAL ); // show capacity, resizings, value of stack, of workspace // etc } // display buffer void showBuffer(struct Buffer * buffer) { printf("capacity: %ld \n", buffer->capacity); // show capacity, resizings, value of stack, of workspace // etc } // valid commands on the machine. The structures in the info[] // array should be in the same order for easy access enum Command { // workspace commands ADD=0, CLIP, CLOP, CLEAR, PRINT, // stack commands POP, PUSH, // tape commands PUT, GET, SWAP, INCREMENT, DECREMENT, // read commands READ, UNTIL, WHILE, WHILENOT, // jumps JUMP, JUMPTRUE, JUMPFALSE, // tests TESTIS, TESTCLASS, TESTBEGINS, TESTENDS, TESTEOF, TESTTAPE, // accumulator commands COUNT, INCC, DECC, ZERO, // append character counter and newline counter CHARS, LINES, // escape commands ESCAPE, UNESCAPE, // system commands STATE, QUIT, // write workspace to file WRITE, NOP, // not a recognised command UNDEFINED }; // the following may be usefull commands on the machine // INDENT, /* indents each line of the workspace */ // or just make a "sub" string substitution function instead // REPLACE, or sub // NEWLINE,/* adds a newline to the workspace */ // CHECK, /* a shift reduce jump */ // LABEL, /* */ // contains help information about all machine commands struct { enum Command c; char * name; char abbreviation; int args; // how many parameters for this command char * shortDesc; char * longDesc; char * example; } info[] = { { ADD, "add", 'a', 1, "adds a given text to the workspace buffer", "appends text to the workspace buffer.", ":'tree' { put; clear; add 'noun*'; push; }" }, { CLIP, "clip", 'k', 0, "removes one character from the end of the workspace", "One character is removed from the end of the workspace \n" "register. The peek register and all other registers are unchanged. \n" "This instruction is useful for obtaining the value of a token when \n" "the end delimiter of the token is not wanted. ", "'\"' { clear; until '\"'; clip; print; } clear; \n " " prints all text within quotes but first removing the \n" " the quotes. The above example could also have been \n" " done with whilenot '\"' and without the clip but in cases \n" " where the end delimiter is more than one character, until \n" " seems to be required." }, { CLOP, "clop", 'K', 0, "removes one character from the beginning of the workspace", "One character is deleted from the beginning of the workspace \n" "register. All other registers are unchanged. ", "d; add 'abc'; clop; print; \n" "prints 'bc' many times. " }, { CLEAR, "clear", 'd', 0, "clears the workspace", "This instruction deletes the contents of the workspace \n" "register. All other registers are unchanged. The parsing machine \n " "is designed so that all text manipulation should take place in \n" "the workspace register (which is like a text 'accumulator' or an \n" "AX/EAX register in the x86 architecture. The clear instruction is \n" "one of a set of pp instructions which modifies the workspace. \n" , "clear;" }, { PRINT, "print", 't', 0, "prints the workspace to stdout", "prints the content of the workspace register to standard-out. \n" "Unlike the sed stream editor, the pattern parser has no implicit \n" "print instruction compiled by default. ", "t;t;t;d; \n" " print every character in the input stream 3 times" }, { POP, "pop", 'p', 0, "pops from the stack to the start of the workspace", "Pop scans the content of the stack starting from its end \n" "and skipping the '*' delimiter character if it is the \n" "last char on the stack \n" "and scanning backwards until it encounters \n" "another * delimiter character \n" "(or whatever is the stack token delimiter). It then adjusts the \n " "workspace register pointer so that it points either to the next '*' \n" "on the stack, or else to the beginning of the stack. Pop also \n" "decrements the tape pointer by one (or does nothing if the \n" "tape pointer == 0). If the stack is empty when 'pop' is executed \n" "then the state of the machine is unchanged (including the tape \n" "pointer)", "pop; \n" "if the stack == 'adj*noun*' before pop is executed and \n" " the workspace == 'verb*fullstop*' then, after the pop \n" " the stack will be 'adj*' and the workspace will be \n" " noun*verb*fullstop* and the tapepointer will be one less \n" }, { PUSH, "push", 'P', 0, "push (*) delimited token from start of the workspace to the stack", "Push appends a delimited token from the beginning of the \n" "workspace register to the end of the stack register. If there is \n" "no text in the workspace then no action is taken and the machine \n" "state is unchanged. The amount \n" "of text appended is determined by the placement of the token \n" "delimiter character (usually *). Push also increments the current \n" "tape cell by one, but only if the workspace is not empty. \n", "push; \n" " If the workspace was adj*noun*verb* before the push, then after \n" " the push it will be 'noun*verb*'. If the stack was sentence* \n" " before the push, then it will be 'sentence*adj*' after the push. \n" " It should be noted that the 'stack' is really just a character \n" " buffer which can be manipulated in stack-like ways with push and pop" }, { PUT, "put", 'G', 0, "puts the workspace into the current tape cell", "copies the content of the machine workspace register \n" "into the tape cell currently indicated by the tape pointer. \n" "The previous contents of this tape cell are overwritten. \n" "The contents of the workspace register are unchanged by this \n" "instruction.", "put;" }, { GET, "get", 'g', 0, "Appends current tape cell to the workspace", "This appends the contents of the current tape cell to the workspace \n" "register. The contents of the tape cell are unchanged by this \n" "instruction.", "put; get; print; clear; \n" " prints every character in the input stream twice.\n" " [note that there is an implicit 'read' at the beginning \n" " of every script, and an implicit 'jump 0' looping instruction \n" " at the end of every script. This is because the pattern parsing \n" " machine is designed to operate on streams, one character at a \n" " time]. " }, { SWAP, "swap", 'x', 0, "swaps the current tape cell and the workspace", "The contents of the workspace register and the current \n" "tape cell are swapped.", "swap; print; clear; \n" " prints each pair of characters in the input stream in \n" " reverse order. That is, if the input stream is 'abcdef' \n" " then the script will print 'badcfe'" }, { INCREMENT, "++", '>', 0, "increments the current tape cell by one", "When a push operation on the stack is performed, auto ++", "++;" }, { DECREMENT, "--", '<', 0, "decrements the current tape cell by one", "The tape cell pointer is decreased by 1. If the tape cell pointer \n" "is already zero, then this instruction does nothing. This instruction \n" "is useful to access the values/ attributes of parsing tokens on the \n" "stack, but the script writer needs to realign the tape pointer, so \n" "that subsequent pushes and pops will work correctly. In general, this \n" "means that ++ instructions should be matched by -- instructions and \n" "vice-versa. ", "get; --; get; ++; \n" " gets last 2 attributes from the tape \n" " ... \n" }, { READ, "read", 'r', 0, "read a character from the input stream", "The character in the peep register is appended to the end of the \n" "workspace register, and then the next character from the input stream \n" "is placed in the peep register. If the peep register already holds \n" "the end-of-file token, then the program will normally exit. \n\n" "All scripts have an implicit 'read' instruction at the very beginning \n" "of the program. (The internal compiler places a 'read' instruction \n" "at position 0). This is because the pattern parser is designed to \n" "cycle through input streams one character at a time. A program which \n" "is hand-assembled in the interpreter or a text file does not need to \n" "include the initial 'read' instruction, but the program may not be \n" "very useful without it." , "t;r;d; \n" " deletes every second character from the input stream \n" " This script would be compiled to 'assembly' as: \n" " read \n" " print \n" " read \n" " clear \n" " jump 0 \n" }, { UNTIL, "until", 'R', 1, "reads the input stream until the workspace ends with given text", "While the workspace buffer does not end with the given \n" "text, the input stream is read through the peep register and \n" "appended to the workspace. However, the 'until' command \n" "takes note of the character in the escape register and ignores \n" "text ending with characters escaped with that character. \n\n" "This instruction is useful for parsing tokens which have a \n" "multiple-character end delimiter, such as, html comments which \n" "end with -->. Or c comments ending in */ ", "[:space:] {d;} '/' { r; '*' {until '*/'; d; } t;d;} t;d; \n" " deletes c style comments from the input stream \n" " [this script is untested. Also, note that unlike sed \n" " the steam editor in pp there is no implicit 'print' instruction] "}, { WHILE, "while", 'w', 1, "reads io while peek is something", "While continues to read the input stream while \n" "the character in the 'peek' register is in a particular \n" "character class (eg, alphanumeric, integer, whitespace etc) \n" "character range (eg p-z) or a character list (eg: 678abc^&*). \n\n" "This instruction is useful for parsing tokens which are defined \n" "by a set of legal characters, for example c identifiers \n", "while :alnum:; w a-f; while abxy; w \"" }, { WHILENOT, "whilenot", 'W', 1, "reads io while peek is not something", "The whilenot machine instruction is the complementary \n" "instruction to the while instruction. It reads the input stream \n" "while the character in the peek register is not a specified \n" "character class, character range or character list. \n" "", "W :punct: \n" " keep reading the input stream and append to the workspace \n" " register as long as the peek register is not a punctuation \n" " character. " }, { JUMP, "jump", ',', 1, "unconditional jump to absolute instruction number", "Transfer program control to the given instruction number. \n" "This command is automatically compiled into a program as the last \n" "instruction in the form JUMP 0. This is because, pp, like sed has \n" "an implicit character reading loop. The command is not designed to \n" "be used in a script, but can be used in an assembler code text file. ", "jump 0; " }, { JUMPTRUE, "jumptrue", 'j', 1, "jump to relative instruction +/-N if flag is set to true ", "Jumps if the flag is set to true. This flag is set by \n" "instructions such as testis, testbegins etc. The parameter to \n" "the instruction is a relative offset of the next \n" "instruction to be executed by the machine. \n" "This is designed only to be used in assembler code text files. \n" "", "jumptrue 10 \n" " jumps forward 10 instructions if the flag register is set to true " }, { JUMPFALSE, "jumpfalse", 'J', 1, "jump to relative instruction +/-N if flag is set to false ", "If the flag register is set to false, transfer program control \n" "to the specified instruction at offset. If the flag register is true \n" "the state of the machine is unchanged and the next instruction in the \n" "program will be executed. \n\n" "This instruction is used in conjuction with the 'test' instructions \n" "to implement condition behaviour. In particular, it allows certain \n" "instructions to be skipped if certain conditions in the workspace \n" "register are met. This is a relative jump", "J -10 \n" " jumps back 10 instructions if the flag register is set to false" }, { TESTIS, "testis", '=', 1, "Test if the workspace equals the given text", "If the workspace buffer is exactly the same as the text \n" "then set the machine flag to true. \n", " \"abc\" { pop; } \n " " pop one item off the machine stack if the workspace \n" " is currently equal to 'abc'." }, { TESTCLASS, "testclass", '?', 1, "test if all characters in the workspace are in given class/list/range", "[note: on reflection, this instruction could be amalgamated \n" "with the testis instruction. The only difference is the datatype \n" "of the parameter- range a-z, list abcd, class :alnum:, text blah etc] \n" "This command allows \n" "character class tests eg [:space:] tests if \n" "the workspace consists entirely of space characters \n" "[:digit:] tests that the workspace only contains numeric digits \n" "Also range tests, eg: '[a-z]' workspace only has characters " "Also list tests, eg: '[abxy]' workspace must contain only the " "listed characters ", "[:alnum:] { add 'X' } \n " " adds X to the workspace if the workspace consists only of \n" " alphanumeric characters." }, { TESTBEGINS, "testbegins", 'b', 1, "tests if the workspace begins with the given text", "If the workspace buffer ends with the specified text " "then the flag register is set to false, otherwise it is " "set to true. This command is used in conjuction with the " "jump instructions to control program flow.", ".\"abc\" { clear; } \n" " deletes the content of the workspace if it begins with \n" " 'abc' " }, { TESTENDS, "testends", 'B', 1, "tests if the workspace ends with the given text", "Check if the workspace register ends with the given text \n" "and set the machine flag register to true if so, and to false \n" "otherwise. ", ",\"abc\" { clear; } \n" " deletes the content of the workspace if it ends with \n" " 'abc' " }, { TESTEOF, "testeof", 'E', 0, "tests if the peep register contains the end of file character", "long desc...", "EOF { print 'end of file'; } \n" " prints 'end of file' when the end of the input stream has \n" " been reached." }, { TESTTAPE, "testtape", '*', 0, "tests if the workspace is the same as the content of the current cell", "If the current cell (the cell pointed to by the tape pointer) \n" "of the machine tape structure is exactly equal \n" "to the content of the workspace register, then the flag register is \n" "set to true, otherwise the flag is set to false.", "** { print 'equal'; } \n" " print the word equal to standard-out if the tape cell is \n" " the same as the workspace." }, { COUNT, "count", 'n', 0, "appends the value of the accumulator to the workspace", "The (long?) integer accumulator is appended as text to the \n" "workspace buffer with no preceding space. ", "add 'chars read'; count; " }, { INCC, "a+", '+', 0, "increments the machine accumulator by one", "increments the integer accumulator in the parsing machine \n" "Hopefully this accumulator will be useful in implementing address \n" "calculation when 'assembling' programs. This type of calculation \n" "is required when converting if and loop blocks in highish level \n" "languages to forward jumps in assembler type languages. \n" "Since the pattern parsing machine is designed to transform, translate \n" "compile and assemble, this is an important function.", "a+ ; EOF { add 'total chars = '; count; print; } \n" " for every character read from the input stream, increment \n" " the machine accumulator. At the end of the input stream, \n" " display the total number of characters in the stream/ file. " }, { DECC, "a-", '-', 0, "decrements the accumulator by one", "Decrements the integer accumulator in the parsing machine. \n" "incc and decc could be used, for example to ensure that opening \n " "and closing braces in some programming language are 'balanced'. ", "a-; \n" " decrement the accumulator by one." }, { ZERO, "zero", '0', 0, "sets the numerical accumulator to zero", "resets the machine accumulator back to zero. ", "incc; print; 0; print; " }, { CHARS, "cc", 'c', 0, "Appends number of characters read to end of workspace buffer", "Appends number of characters read to end of workspace buffer", "add 'chars read='; cc; print; clear; " }, { LINES, "ll", 'l', 0, "Appends number of lines read to end of workspace buffer", "Appends number of lines read to end of workspace buffer", "add 'lines read='; ll; print; clear; " }, { ESCAPE, "escape", '^', 1, "prefixes the given character with the escape character ", "This command replaces a character with itself preceded by \n" "the machines escape char (usually \\). This is important in \n" "many parsing situations because it allows, for example, a double \n" "quote character to be included within double quotes.\n" "[note: I need to think through the tricky issues of escaping the \n" " escape character itself].", "esc ';'; \n" " replace all occurrences of the semi-colon in the workspace \n" " with \\; "}, { UNESCAPE, "unescape", 'v', 1, "converts chars to unescaped equivalent", "removes the escape char prefix from the given character ", "unescape x;" }, { STATE, "state", 'S', 0, "prints the current state of the machine. unimplemented sept 2018", "state prints to stdout the values of the registers of " "the parsing machine, such as the workspace, the stack, the peep " "the tape and the numerical accumulator", "state; " }, { QUIT, "quit", 'q', 0, "immediately exits the script", "This command ceases all commands and exits the parsing machine \n" "process. This command should just be called 'exit' ...", "crash; " }, { WRITE, "write", 's', 0, "writes the workspace to file 'sav.pp'", "The contents of the workspace are saved to the file 'sav.pp' \n" "This is also used to convert a script to assembler and then run it", "write; " }, { NOP, "nop", 'o', 0, "no operation", "this command performs no operation. The program ip is incremented \n" "by 1 and the state of the machine is otherwise unchanged. \n" "I am not actually convinced \n" "that it is necessary, but since most real machines have a nop \n" "I will kowtow to the tyranny of consensus.", "nop; \n" " the state of the machine is unchanged except that the \n" " instruction pointer is incremented by 1." }, { UNDEFINED, "undefined", 'e', 0, "undefined command", "The undefined command is used as a 'bookend' programmatically " "and to catch all invalid commands. When a program is cleared " "all instruction commands are set to undefined.", "n/a" } }; // given the abbreviated command, this returns the command // number (in the array and enumeration) enum Command abbrevToCommand(char c) { int ii; for (ii = ADD; ii < UNDEFINED; ii++) { if (info[ii].abbreviation == c) return ii; } return UNDEFINED; } enum CommandType {JUMPS=0, TESTS, STACK, WORK, TAPE, ACCUMULATOR, OTHER}; // what type of command is it, a jump, test, stack, etc enum CommandType commandType(enum Command com) { if (strncmp(info[com].name, "jump", 4) == 0) return JUMPS; if (strncmp(info[com].name, "test", 4) == 0) return TESTS; if (strcmp(info[com].name, "push") == 0) return STACK; if (strcmp(info[com].name, "pop") == 0) return STACK; if (strcmp(info[com].name, "get") == 0) return TAPE; if (strcmp(info[com].name, "put") == 0) return TAPE; if (strcmp(info[com].name, "a+") == 0) return ACCUMULATOR; if (strcmp(info[com].name, "a-") == 0) return ACCUMULATOR; if (strcmp(info[com].name, "zero") == 0) return ACCUMULATOR; if (strcmp(info[com].name, "cc") == 0) return ACCUMULATOR; if (strcmp(info[com].name, "ll") == 0) return ACCUMULATOR; if (strcmp(info[com].name, "until") == 0) return WORK; if (strcmp(info[com].name, "while") == 0) return WORK; if (strcmp(info[com].name, "whilenot") == 0) return WORK; if (strcmp(info[com].name, "read") == 0) return WORK; return OTHER; } // given a short or long machine command name, return the // enum command value enum Command textToCommand(const char * text) { int ii; for (ii = ADD; ii < UNDEFINED; ii++) { if ((strlen(text) == 1) && (info[ii].abbreviation == text[0])) return ii; else if (strcmp(text, info[ii].name) == 0) return ii; } return UNDEFINED; } // info functions: void fprintCommandNames(FILE * file) { printf("Valid machine commands are [name (abbrev)]:\n "); int ii; for (ii = ADD; ii < UNDEFINED; ii++) { printf("%s ", info[ii].name); } printf("\n"); } // display help for all commands, one per line void commandHelp() { printf("%s[All machine instructions]%s:\n", YELLOW, NORMAL); int ii; char key; for (ii = 0; ii < UNDEFINED; ii++) { printf(" %s%c%s - %s: %s%s%s \n", WHITE, info[ii].abbreviation, BLUE, info[ii].name, GREEN, info[ii].shortDesc, NORMAL); if ( (ii+1) % 14 == 0 ) { pagePrompt(); key = getc(stdin); if (key == 'q') { return; } } } printf("\n"); } // searches info array for machine commands matching a search term void searchCommandHelp(char * text) { int ii; int jj = 0; printf("%s[Searching machine commands]%s:\n", YELLOW, NORMAL); for (ii = 0; ii < UNDEFINED; ii++) { if ((strstr(info[ii].shortDesc, text) != NULL) || (strstr(info[ii].longDesc, text) != NULL) || (strstr(info[ii].name, text) != NULL)) { jj++; printf(" %s%c%s - %s: %s \n", GREEN, info[ii].abbreviation, NORMAL, info[ii].name, info[ii].shortDesc ); if ( (jj+1) % 14 == 0 ) { pagePrompt(); getc(stdin); } } // if search term found } // for if (jj == 0) { printf("No results found for '%s'\n", text); } } void showCommandNames() { printf("%s Valid machine commands are [name (abbrev)]: \n%s", BLUE, NORMAL); int ii; for (ii = ADD; ii < UNDEFINED; ii++) { printf(" %s %s(%s%c%s)%s", info[ii].name, GREY, YELLOW, info[ii].abbreviation, GREY, NORMAL); if ( (ii+1) % 4 == 0 ) { printf ("\n"); } } printf("\n"); } // display each machine command and a one line description // of what it does using bash colours void fprintCommandSummary(FILE * file) { printf("A Summary of All Machine Commands:\n"); printf("Format: name (abbreviation) title\n"); int ii; for (ii = ADD; ii < UNDEFINED; ii++) { printf("%s\n %s", info[ii].name, info[ii].shortDesc); } printf("\n"); } // display each machine command and a one line description // of what it does using bash colours and paging a few lines at // a time void showCommandSummary() { printf("%sA Summary of All Machine Commands:\n%s ",BLUE,NORMAL); printf("%sFormat: name (abbreviation) title\n%s ",CYAN,NORMAL); int ii; for (ii = ADD; ii < UNDEFINED; ii++) { printf("%s%s (%c)%s\n %s\n%s", BLUE, info[ii].name, info[ii].abbreviation, GREEN, info[ii].shortDesc, NORMAL); if ( (ii+1) % 10 == 0 ) { printf("any key to continue..."); getc(stdin); } } printf("\n"); } // print all information about a given command void printCommandInfo(enum Command command) { printf( "command name: %s\n" "abbreviation: %c\n" "number of arguments: %d\n" "short description: %s\n" "long description: %s\n" "example: %s\n", info[command].name, info[command].abbreviation, info[command].args, info[command].shortDesc, info[command].longDesc, info[command].example ); } // old void printCommandNamesAndDescriptions() { printf("Valid commands are:\n "); int ii; for (ii = ADD; ii < NOP; ii++) { printf("%s;\n %s\n", info[ii].name, info[ii].shortDesc); } printf("\n"); } // the following enumeration and structure provide information // about the standard c character classes, defined in ctype.h // these classes will be used with the TESTCLASS command and // also with the WHILE command. Parameters for Instructions will // have a function pointer to the correct class function compiled // into them, to speed up execution. // each class corresponds to a "isUpper, isLower etc" function. // fn = &isalnum; enum Class { ALNUM=0, ALPHA, BLANK, CNTRL, DIGIT, GRAPH, LOWER, PRINTABLE, PUNCT, SPACE, UPPER, XDIGIT, NOCLASS }; // contain information about different character classes struct { enum Class class; char * name; char * description; int (* classFn)(int); // a function pointer for the ctype.h functions } classInfo[] = { { ALNUM, "alnum", "alphanumeric like [0-9a-zA-Z]", isalnum}, { ALPHA, "alpha", "alphabetic like [a-zA-Z]", isalpha}, { BLANK, "blank", "blank chars, space and tab", isblank}, { CNTRL, "cntrl", "control chars, ascii 000 to 037 and 177 (del)", iscntrl}, { DIGIT, "digit", "digits 0-9", isdigit}, { GRAPH, "graph", "graphical chars same as :alnum: and :punct:", isgraph}, { LOWER, "lower", "lower case letters [a-z]", islower}, { PRINTABLE, "print", "printable chars ie :graph: + space", isprint}, { PUNCT, "punct", "punctuation ie !\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~.", ispunct}, { SPACE, "space", "all whitespace, eg \\n\\r\\t vert tab, space, \\f", isspace}, { UPPER, "upper", "upper case letters [A-Z]", isupper}, { XDIGIT, "xdigit", "hexadecimal digit ie [0-9a-fA-F]", isxdigit,}, { NOCLASS, "noclass", "", NULL}, }; // given a character class text return the enumerated constant enum Class textToClass(const char * text) { int ii; for (ii = 0; ii < NOCLASS; ii++) { if (strncmp(text, classInfo[ii].name, strlen(classInfo[ii].name)) == 0) return ii; } return NOCLASS; } // given a pointer to a class function return the class // a bit tricky .... // a func pointer as argument to a function. //int add2to3(int (*functionPtr)(int, int)) { enum Class fnToClass(void * func) { int ii; for (ii = 0; ii < NOCLASS; ii++) { if (func == classInfo[ii].classFn) return ii; } return NOCLASS; } // display all valid character classes void showClasses() { printf("%sValid classes are: \n%s", CYAN, NORMAL); int ii; for (ii = 0; ii < NOCLASS; ii++) { if ( ii % 4 == 0 ) printf("\n "); printf("%s%s%s - ", GREEN, classInfo[ii].name, GREY); } printf("\n"); printf("%sClasses are used by the commands : \n%s", AQUA, NORMAL); printf("%s while, whilenot and testclass \n%s", AQUA, NORMAL); printf("%seg: %s w [:space:] \n%s", BLUE, CYAN, NORMAL); printf(" reads the input stream while the peek register \n"); printf(" is a whitespace character (space, tab, newline etc) \n"); } // display help for all character classes, one per line void printClasses() { printf("%sValid classes are: \n%s", CYAN, NORMAL); int ii; for (ii = 0; ii < NOCLASS; ii++) { printf(" %s%s%s - %s \n", GREEN, classInfo[ii].name, NORMAL, classInfo[ii].description); if ( (ii+1) % 14 == 0 ) { printf("\n%spress to continue...%s", CYAN, NORMAL); getc(stdin); } } printf("\n"); printf("%sClasses are used by the commands : \n%s", AQUA, NORMAL); printf("%s while, whilenot and testclass \n%s", AQUA, NORMAL); printf("%seg: %s w :space: \n%s", BLUE, CYAN, NORMAL); printf(" reads the input stream while the peek register \n"); printf(" is a whitespace character (space, tab, newline etc) \n"); } // parameters within a compiled instruction. They may have // various data types; text numeric etc (eg: a jump to another instruction) // hence the need for a union structure. // struct Parameter { //int datatype; //the datatype determines how the parameter will be used. //eg: int as a jump target, text as something to add to workspace, // range in a test etc enum Type { TEXT, // some string INT, // integer number CLASS, // a character class (eg alnum, alpha, digit, xdigit) RANGE, // range of characters eg a-z LIST, // list of characters eg abxy UNSET // no value } datatype; union { char text[200]; // probably should be char * text, with malloc int number; // eg for jumps (jump 6 - jump to instruction 6) int (* classFn)(int); // a function pointer for the ctype.h functions char range[2]; // first, last char list[30]; // should be malloced }; }; // one item in the label jump table (to convert asm labels to // instruction/line numbers struct Label { char text[64]; // label maximum 64 characters int instructNumber; // line/instruction number equivalent for label }; // a declaration void printJumpTable(struct Label []); // a compiled instruction parsed from the script text, or // user input // eg: add 'xx' struct Instruction { enum Command command; struct Parameter a; struct Parameter b; }; //struct Instruction * newInstruction(struct Instruction * ii, enum Command cc) { void newInstruction(struct Instruction * ii, enum Command cc) { ii->command = cc; ii->a.datatype = UNSET; ii->b.datatype = UNSET; } // probably a memcpy of old and new would do the trick // struct Instruction * copyInstruction( void copyInstruction( struct Instruction * new, struct Instruction * old) { //memcpy(new, old, sizeof(new)); new->command = old->command; memcpy(&new->a, &old->a, sizeof(struct Parameter)); memcpy(&new->b, &old->b, sizeof(struct Parameter)); } /* display the instruction with parameters and parameter types. */ void debugInstruction(struct Instruction * ii) { printf("%s%9s %s[%s", AQUA, info[ii->command].name, GREY, CYAN); // also display "unset" datatype, with empty value /* switch (ii->a.datatype) { case INT: printf("%sint:%s%d", BROWN, CYAN, ii->a.number); break; case TEXT: } */ if (ii->a.datatype == INT) printf("%sint:%s%d", BROWN, CYAN, ii->a.number); else if (ii->a.datatype == TEXT) { printf("%stext:%s", BROWN, CYAN); escapeSpecial(ii->a.list, CYAN); } else if (ii->a.datatype == CLASS) printf("%sclass:%s%s", BROWN, CYAN, classInfo[fnToClass(ii->a.classFn)].name); else if (ii->a.datatype == RANGE) printf("%srange:%s%c-%c", BROWN, CYAN, ii->a.range[0], ii->a.range[1]); else if (ii->a.datatype == LIST) { printf("%slist:%s", BROWN, CYAN); escapeSpecial(ii->a.list, CYAN); } else if (ii->a.datatype == UNSET) printf("%sunset%s", BROWN, CYAN); else printf("%s??error??:%s", BROWN, CYAN); printf("%s]%s ", GREY, NORMAL); // second parameter, but this is not used currently. printf("%s[%s", GREY, CYAN); if (ii->b.datatype == INT) printf("%d", ii->b.number); else if (ii->b.datatype == TEXT) printf("%s", ii->b.text); else if (ii->b.datatype == CLASS) printf("class:%s", classInfo[fnToClass(ii->b.classFn)].name); else if (ii->b.datatype == RANGE) printf("range:%c-%c", ii->a.range[0], ii->b.range[1]); else if (ii->b.datatype == LIST) printf("list:%s", ii->b.list); // no parameter so to nothing //else printf(""); printf("%s]%s ", GREY, NORMAL); } /* show the instruction in a format suitable for program listings Use the escapeSpecial() function below to to display special chars (eg \n \r ...) as escaped sequences in a different colour to the rest of the text. todo:!! write function printParameter() to display a parameter with its datatype */ void printInstruction(struct Instruction * ii) { printf("%s%9s", AQUA, info[ii->command].name); // could change this to a switch statement /* switch (ii->a.datatype) { case INT: printf("%sint:%s%d", BROWN, CYAN, ii->a.number); break; case TEXT: } */ // if the parameter is not used, dont print it if (ii->a.datatype != UNSET) { printf(" %s[%s", GREY, CYAN); if (ii->a.datatype == INT) printf("%sint:%s%d", BROWN, CYAN, ii->a.number); else if (ii->a.datatype == TEXT) { printf("%stext:%s", BROWN, CYAN); escapeSpecial(ii->a.list, CYAN); } else if (ii->a.datatype == CLASS) printf("%sclass:%s%s", BROWN, CYAN, classInfo[fnToClass(ii->a.classFn)].name); else if (ii->a.datatype == RANGE) printf("%srange:%s%c-%c", BROWN, CYAN, ii->a.range[0], ii->a.range[1]); else if (ii->a.datatype == LIST) { printf("%slist:%s", BROWN, CYAN); escapeSpecial(ii->a.list, CYAN); } else printf("%s??error??:%s", BROWN, CYAN); printf("%s]%s ", GREY, NORMAL); } } // printInstruction // convert special characters in a string to their escaped // equivalent eg \n \r \v \f \t \? etc // also escapes the given delimiter eg " or ] // // this has to be done char by char and is used in the // saveAssembledProgram() function, writeInstruction() function etc void escapeText(FILE * file, char * text) { char * cc = text; while (*cc != '\0') { switch (*cc) { case '"': fprintf(file, "\\\""); break; case ']': fprintf(file, "\\]"); break; case '\\': fprintf(file, "\\\\"); break; case '\n': fprintf(file, "\\n"); break; case '\t': fprintf(file, "\\t"); break; case '\r': fprintf(file, "\\r"); break; case '\f': fprintf(file, "\\f"); break; default: fprintf(file, "%c", *cc); break; } // switch cc cc++; } // while } // fn // this escapes special chars such as \n \r \\ and colours // them differently and displays on stdout. This will be used // by the printInstruction() function. This does not escape delimiter // chars such as " and ] void escapeSpecial(char * text, char * colour) { char * cc = text; // dont need this line, since normal characters // are handled below in the switch printf("%s", colour); while (*cc != '\0') { switch (*cc) { case '\n': printf("%s\\n%s", YELLOW, colour); break; case '\t': printf("%s\\t%s", YELLOW, colour); break; case '\r': printf("%s\\r%s", YELLOW, colour); break; case '\f': printf("%s\\f%s", YELLOW, colour); break; case '\v': printf("%s\\v%s", YELLOW, colour); break; default: printf("%s%c", colour, *cc); break; } // switch cc cc++; } // while printf("%s", NORMAL); } // show the instruction in a format suitable for program listings // with special chars \n \r etc shown escaped in different colour void printEscapedInstruction(struct Instruction * ii) { printf(" %s%s %s[%s", BLUE, info[ii->command].name, GREY, CYAN); if (ii->a.datatype == INT) printf("%d", ii->a.number); else if (ii->a.datatype == TEXT) escapeSpecial(ii->a.text, CYAN); else if (ii->a.datatype == CLASS) printf("%sclass:%s%s", BROWN, CYAN, classInfo[fnToClass(ii->a.classFn)].name); else if (ii->a.datatype == RANGE) printf("%srange:%s%c-%c", BROWN, CYAN, ii->a.range[0], ii->a.range[1]); else if (ii->a.datatype == LIST) { printf("%slist:", BROWN); escapeSpecial(ii->a.list, CYAN); } // else printf(""); printf("%s]%s ", GREY, NORMAL); // deal with 2nd parameter ii->b ?? } // this is used by saveAssembledProgram() or writeAssembledProgram() // special and delim chars are escaped, eg: \n \r \] \" // void writeInstruction(struct Instruction * ii, FILE * file) { fprintf(file, "%s ", info[ii->command].name); switch (ii->a.datatype) { case INT: fprintf(file, "%d", ii->a.number); break; case TEXT: // special chars \n \r \t \\ and " need to be escaped fprintf(file, "\""); escapeText(file, ii->a.text); fprintf(file, "\""); break; case CLASS: fprintf(file, "[:%s:]", classInfo[fnToClass(ii->a.classFn)].name); break; case RANGE: // todo: check range arguments (escaped characters???) // no I dont think ranges should have escaped characters fprintf(file, "[%c-%c]", ii->a.range[0], ii->a.range[1]); break; case LIST: // special chars \n \r \t \\ etc and ] need to be escaped fprintf(file, "["); escapeText(file, ii->a.list); fprintf(file, "]"); break; default: // no parameter // fprintf(file, ""); break; } // deal with 2nd parameter, but this is not currently used .... } // print to file the instruction with a line number and colours // this function is used in the validateProgram function to display // or log program errors. We also need something like this in // saveAssembledProgram() // void numberedInstruction( struct Instruction * instruction, int lineNumber, FILE * file) { fprintf(file, "%d: %s ", lineNumber, info[instruction->command].name); if (instruction->a.datatype == INT) fprintf(file, "[%d] ", instruction->a.number); else if (instruction->a.datatype == TEXT) fprintf(file, "[%s] ", instruction->a.text); else printf("[] "); if (instruction->b.datatype == INT) fprintf(file, "[%d] ", instruction->b.number); else if (instruction->b.datatype == TEXT) fprintf(file, "[%s] ", instruction->b.text); else printf("[] "); printf("\n"); } // the compiled program, which is a set of instructions // the instruction listing really needs to be malloced dynamically // not a static array #define PROGRAMCAPACITY 500 struct Program { // date and time when compiled time_t compileDate; // how long program took to compile (milliseconds), timed with clock() ? clock_t compileTime; // starting to execute int startExecute; // time ended executing int endExecute; // whether new compiled instructions are appended to the // program (after count), inserted after IP or overwrite from ip // ??? necessary. enum CompileMode {APPEND, INSERT, OVERWRITE} compileMode; // how much room for instructions size_t capacity; // how many times has program memory been reallocated int resizings; //how many instructions are in the program int count; //current instruction (next to be executed) int ip; // the set of program instructions (dynamically allocated) struct Instruction * listing; // if applicable, name of assembly file containing instructions char source[128]; // an array of line labels and line numbers from assembly listing struct Label labelTable[256]; // static allocation of memory for instructions // struct Instruction listing[PROGRAMCAPACITY]; }; /* a prototype declaration, just to stop a gcc compiler warning, because I am using compile() before I define it. */ int compile(struct Program *, char *, int, struct Label[]); void newProgram(struct Program * program, size_t capacity) { program->resizings = 0; program->count = 0; program->ip = 0; program->compileTime = -1; program->compileDate = -1; program->startExecute = -1; program->endExecute = -1; program->listing = malloc(capacity * sizeof(struct Instruction)); if(program->listing == NULL) { fprintf(stderr, "couldnt allocate memory for program listing newProgram()\n"); exit(EXIT_FAILURE); } program->capacity = capacity; int ii; for (ii = 0; ii < capacity; ii++) { newInstruction(&program->listing[ii], UNDEFINED); } strcpy(program->source, "?"); memset(program->labelTable, 0, sizeof(program->labelTable)); } void freeProgram(struct Program * pp) { int ii; for (ii = 0; ii < pp->capacity; ii++) { // If instruction parameters are malloced, as they // should be, then we will have to free the associated // memory. But at the moment it is static memory allocation. //freeInstruction(&program->listing[ii]); } free(pp->listing); } // increase program capacity // there is a bug in the capacity arithmetic so that // on the second realloc 2 instructions are wiped. void growProgram(struct Program * program, size_t increase) { printf("Program listing is growing!!\n"); program->capacity = program->capacity + increase; program->listing = realloc(program->listing, program->capacity * sizeof(struct Instruction)); if(program->listing == NULL) { fprintf(stderr, "couldnt allocate more memory for program listing in growProgram()\n"); exit(EXIT_FAILURE); } int ii; for (ii = program->count; ii < program->capacity; ii++) { newInstruction(&program->listing[ii], UNDEFINED); } } // insert an instruction at the current ip position void insertInstruction(struct Program * program) { int ii; struct Instruction * thisInstruction; if (program->count == program->capacity) { printf("no more room in program... \n"); return; } for (ii = program->count-1; ii >= program->ip; ii--) { thisInstruction = &program->listing[ii]; if (commandType(thisInstruction->command) == JUMPS) { // bug!! actually we only increment jumps if // the jump target is after the instruction if (thisInstruction->a.number > program->ip) { thisInstruction->a.number++; } } //printInstruction(thisInstruction); printf(" \n"); copyInstruction(&program->listing[ii+1], thisInstruction); } newInstruction(&program->listing[program->ip], NOP); program->count++; } /* print meta information about the program This information is part of the Program structure, so it is not really meta information, technically. */ void printProgramMeta(struct Program * program) { char Colour[30]; char date[30]; char time[30]; strcpy(date, "?"); strcpy(time, "?"); strcpy(Colour, BROWN); printf("%s Program source:%s %s \n", Colour, NORMAL, program->source); printf("%sCapacity (Instructions):%s %ld \n", Colour, NORMAL, program->capacity); printf("%s Memory reallocation:%s %d \n", Colour, NORMAL, program->resizings); printf("%s How many instructions:%s %d \n", Colour, NORMAL, program->count); printf("%s Current instruction:%s %d instruction:", Colour, NORMAL, program->ip); printInstruction(&program->listing[program->ip]); printf("\n"); if (program->compileDate != -1) { strcpy(date, ctime(&program->compileDate)); } if (program->compileTime != -1) { sprintf(time, "%ld", program->compileTime); } printf("%s Compiled at:%s %s \n", Colour, NORMAL, date); printf("%s Compile time:%s %s milliseconds \n", Colour, NORMAL, time); } /* given a text label get the line/instruction number from the table or else return -1 */ int getJump(char * label, struct Label table[]) { int ii; for (ii = 0; ii < 1000; ii++) { // if not found if (table[ii].text[0] == 0) return -1; // return number if label found if (strcmp(table[ii].text, label) == 0) { return table[ii].instructNumber; } } return -1; } /* return label for given instruction, or else null an empty string means "not found" */ char * getLabel(int instruction, struct Label table[]) { int ii = 0; for (ii = 0; ii < 1000; ii++) { // if not found if (table[ii].text[0] == 0) break; if (table[ii].instructNumber == instruction) { return table[ii].text; } } return table[ii].text; } // print the instructions in a program void printProgram(struct Program * program) { int ii; char key; struct Instruction * thisInstruction; printf("%sFull Program Listing %s", CYAN, NORMAL); printf(" %s(%ssize:%s%d%s ip:%s%d%s cap:%s%lu%s)%s \n", GREY, GREEN, NORMAL, program->count, GREEN, NORMAL, program->ip, GREEN, NORMAL, program->capacity, GREY, NORMAL); for (ii = 0; ii < program->count; ii++) { // page the listing if ( (ii+1) % 16 == 0 ) { printf("\n%s%s for more (%sq%s = exit):", AQUA, NORMAL, AQUA, NORMAL); key = getc(stdin); if (key == 'q') { return; } // go back a page? lets not worry about it // if (key == 'b') { ii = (((ii-16)>(0))?(ii-16):(0)); } } thisInstruction = &program->listing[ii]; if (ii == program->ip) { printf("%s%3d> ", YELLOW, ii); printInstruction(thisInstruction); printf("%s\n", NORMAL); } else { printf("%s%3d:%s ", WHITE, ii, NORMAL); printInstruction(thisInstruction); printf("%s\n", NORMAL); } } } /* print all the instructions in a compiled program along with the labels which were parse from the assembly listing This may be handy for debugging jump target problems. Some jump targets may be relative, no?? */ void printProgramWithLabels(struct Program * program) { int ii; char key; char * label; struct Instruction * thisInstruction; printf("%s[Full Program Listing] %s", YELLOW, NORMAL); printf(" %s(%ssize:%s%d%s ip:%s%d%s cap:%s%lu%s)%s \n", GREY, GREEN, NORMAL, program->count, GREEN, NORMAL, program->ip, GREEN, NORMAL, program->capacity, GREY, NORMAL); /* to do better paging, we can change this to a while loop and increment ii or set ii=0 (g) ii=max (G) etc */ for (ii = 0; ii < program->count; ii++) { // page the listing if ( (ii+1) % 16 == 0 ) { printf("\n%s%s for more (%sq%s = exit) ...", AQUA, NORMAL, AQUA, NORMAL); key = getc(stdin); // make 'g' go to top, 'G' go to bottom etc if (key == 'q') { return; } // go back a page here, but how to do it? // if (key == 'b') { ii = (((ii-16)>(0))?(ii-16):(0)); } } thisInstruction = &program->listing[ii]; // check if there is a label label = getLabel(ii, program->labelTable); if (strlen(label) > 0) { printf("%s:\n", label); } if (ii == program->ip) { printf("%s%3d> ", YELLOW, ii); printInstruction(thisInstruction); printf("%s\n", NORMAL); } else { printf("%s%3d:%s ", WHITE, ii, NORMAL); printInstruction(thisInstruction); printf("%s\n", NORMAL); } } } // show the program around the ip instruction pointer void printSomeProgram(struct Program * program, int context) { int ii; struct Instruction * thisInstruction; int start = program->ip - context; int end = program->ip + context; if (start < 0) start = 0; if (end > program->capacity) end = program->capacity; //if (program->ip > end) end = program->ip + 2; printf("%sPartial Program Listing%s", CYAN, NORMAL); printf(" %s(%ssize:%s%d%s ip:%s%d%s cap:%s%lu%s)%s \n", GREY, GREEN, NORMAL, program->count, GREEN, NORMAL, program->ip, GREEN, NORMAL, program->capacity, GREY, NORMAL); for (ii = start; ii < end; ii++) { thisInstruction = &program->listing[ii]; if (ii == program->ip) { printf(" %s%d> ", YELLOW, ii); printInstruction(thisInstruction); printf("%s\n", NORMAL); } else { printf(" %s%d%s: ", CYAN, ii, NORMAL); printInstruction(thisInstruction); printf("%s\n", NORMAL); } } } // save the compiled program as assembler to a text file // we need to escape quotes in arguments !! // also need to handle other parameter datatypes such as class, // range, list etc. // Line numbers may make modifying the assembly laborious. void saveAssembledProgram(struct Program * program, FILE * file) { int ii; fprintf(file, "# Assembly listing \n"); for (ii = 0; ii < program->count; ii++) { //fprintf(file, "%d: ", ii); writeInstruction(&program->listing[ii], file); fprintf(file, "\n"); } fprintf(file, "# End of program. \n"); printf("%s%d%s instructions written to file. \n", BLUE, ii, NORMAL); } // in some cases it may be useful to have a numbered code listing void saveNumberedProgram(struct Program * program, FILE * file) { int ii; fprintf(file, "# Numbered Assembly listing \n"); for (ii = 0; ii < program->count; ii++) { fprintf(file, "%d: ", ii); writeInstruction(&program->listing[ii], file); fprintf(file, "\n"); } fprintf(file, "# End of program. \n"); printf("%s%d%s instructions written to file. \n", BLUE, ii, NORMAL); } // clear the machines compiled program void clearProgram(struct Program * program) { int ii; for (ii = 0; ii < program->count + 50; ii++) { program->listing[ii].command = UNDEFINED; program->listing[ii].a.number = 0; program->listing[ii].a.datatype = UNSET; program->listing[ii].b.number = 0; program->listing[ii].b.datatype = UNSET; } program->count = 0; program->ip = 0; program->compileTime = -1; program->compileDate = 0; } // scans an instruction parameter and stops at the delimiter // also converts escape sequences such as \n \r to their // actual character. The unescaped sequence is stored starting // at pointer param. // return the new position in the scan stream char * scanParameter(char * param, char * arg, int line) { char * nextChar; char delimiter; if (*arg == '[') delimiter = ']'; else if (*arg == '(') delimiter = ')'; else if (*arg == '{') delimiter = '}'; else delimiter = *arg; // eg: " or ' etc nextChar = arg+1; while ((*nextChar != '\0') && (*nextChar != delimiter)) { if (*nextChar == '\\') { nextChar++; // check \n \t \] \\ etc switch (*nextChar) { case 'n': *param = '\n'; break; case 't': *param = '\t'; break; case 'r': *param = '\r'; break; case 'f': *param = '\f'; break; case 'v': *param = '\v'; break; default: // handle all other \\ slash escapes eg \\ \] \" *param = *nextChar; break; } // switch } else { *param = *nextChar; } *(param+1) = 0; nextChar++; param++; } // while not delimiter if (*nextChar != delimiter) { printf("%s Warning: %s on line %s%d%s. \n", PURPLE, NORMAL, YELLOW, line, NORMAL); printf("%s >> %s%s %s\n", BLUE, GREEN, arg, NORMAL); printf(" No terminating %s%c%s character for parameter \n", YELLOW, delimiter, NORMAL); } return nextChar; } // scanParameter /* print out the label table, just to make sure that it is getting built correctly. this table can be, and is, a property of the Program structure */ void printJumpTable(struct Label table[]) { int ii; printf("%s[Program label table]%s:\n", YELLOW, NORMAL); for (ii = 0; ii < 1000; ii++) { if (table[ii].text[0] == 0) break; printf("%s%15s:%s %d \n", BROWN, table[ii].text, NORMAL, table[ii].instructNumber); } } /* fills out an instruction given valid text. Also handles unescaping of special characters and delimiter characters eg: add " this \r \t \" \: \\ " eg: while [ghi \];;\\ ] the line/instruction number parameter is for error messages returns -1 when an error occurs add struct label table[] jumptable parameter. */ int instructionFromText( FILE * file, struct Instruction * instruction, char * text, int lineNumber, struct Label table[]) { char command[1001] = ""; char args[1001] = ""; char label[64] = ""; // possible jump label char charclass[20] = ""; // eg space, alnum, print sscanf(text, "%1000s %2000[^\n]", command, args); //printf("parsed - command=%s, args=%s \n", command, args); if (command[0] == '\0') { fprintf(file, "%s Parse Error: %s on line %s%d%s \n", PURPLE, NORMAL, YELLOW, lineNumber, NORMAL); fprintf(file, "%s >> %s%s %s\n", BLUE, GREEN, text, NORMAL); fprintf(file, " Instruction command missing \n"); return -1; } if (textToCommand(command) == UNDEFINED) { fprintf(file, "%s Parse Error: %s on line %s%d%s. \n", PURPLE, NORMAL, YELLOW, lineNumber, NORMAL); fprintf(file, "%s >> %s%s %s\n", BLUE, GREEN, text, NORMAL); fprintf(file, " Command %s%s%s is not a valid instruction command \n", YELLOW, command, NORMAL); return -1; } instruction->command = textToCommand(command); if ((info[instruction->command].args > 0) && (args[0] == '\0')) { fprintf(file, "%s Error: %s on line %s%d%s of source file. \n", PURPLE, NORMAL, YELLOW, lineNumber, NORMAL); fprintf(file, "%s >> %s%s %s\n", BLUE, GREEN, text, NORMAL); fprintf(file, " Command %s%s%s requires %s%d%s argument(s) but none \n" " is given in the assembly file \n", YELLOW, command, NORMAL, YELLOW, info[instruction->command].args, NORMAL); } // here check if instruction is a jump (ie jumptrue/false/jump) // now check if it already has an integer argument. If not // check if 1st argument is a label. If so, convert to line number // using jumpTable[] and then convert to offset (line number - current // line/instruction number) if (commandType(instruction->command) == JUMPS) { // format "jump 32" if (sscanf(args, "%d", &instruction->a.number) > 0) { instruction->a.datatype = INT; return 0; } // format "jump label" else if (sscanf(args, "%s", label) > 0) { // for debugging see the label jump table // printJumpTable(table); // find the label, int jump = -1; jump = getJump(label, table); // printf(" \n", label, jump); if (jump == -1) { fprintf(file, "%s Parse Error: %s on line %s%d%s. \n", PURPLE, NORMAL, YELLOW, lineNumber, NORMAL); fprintf(file, "%s >> %s%s %s\n", BLUE, GREEN, text, NORMAL); fprintf(file, " Label '%s%s%s' not found. \n", YELLOW, label, NORMAL); fprintf(file, " Labels should start line and end with a \n" " colon (:). They are case sensitive. \n"); return -1; } else { //printf("command: %s \n", info[instruction->command].name); //printf("jump target=%d, linenumber=%d\n", jump, lineNumber); instruction->a.datatype = INT; if (instruction->command == JUMP) { instruction->a.number = jump; } else { instruction->a.number = jump - lineNumber; } } return 0; } } // format "jump 32" if (sscanf(args, "%d", &instruction->a.number) > 0) { instruction->a.datatype = INT; return 0; } // format "while [:space:] else if ((args[0] == '[') && (args[1] == ':')) { if (sscanf(args+2, "%20[^:]", charclass) == 0) { fprintf(file, "%s Error: %s on line %s%d%s. \n", PURPLE, NORMAL, YELLOW, lineNumber, NORMAL); fprintf(file, "%s >> %s%s %s\n", BLUE, GREEN, text, NORMAL); fprintf(file, " In argument %s%s%s, no character class given \n", YELLOW, args, NORMAL); return 1; } else if (textToClass(charclass) == NOCLASS) { fprintf(file, "%s Error: %s on line %s%d%s of source file. \n", PURPLE, NORMAL, YELLOW, lineNumber, NORMAL); fprintf(file, "%s >> %s%s %s\n", BLUE, GREEN, text, NORMAL); fprintf(file, " The character class %s%s%s is not valid \n", YELLOW, charclass, NORMAL); return 1; } instruction->a.datatype = CLASS; instruction->a.classFn = classInfo[textToClass(charclass)].classFn; return 0; } // format "whilenot [a-z] else if ((args[0] == '[') && (args[2] == '-')) { instruction->a.datatype = RANGE; instruction->a.range[0] = args[1]; instruction->a.range[1] = args[3]; return 0; } else if (args[0] == '[') { // bracket delimited parameter instruction->a.datatype = LIST; scanParameter(instruction->a.list, args, lineNumber); } else if (args[0] == '"') { // quote delimited parameter instruction->a.datatype = TEXT; scanParameter(instruction->a.list, args, lineNumber); } else if (args[0] == '{') { // brace delimited parameter instruction->a.datatype = TEXT; scanParameter(instruction->a.list, args, lineNumber); } // what sort of argument // print warnings if, for example, jump does not have an integer target // checkInstruction(instruction, stdout); return 0; } /* just build an array with label line numbers returns the number of labels found */ int buildJumpTable(struct Label table[], FILE * file) { char buffer[1000]; char * line; int lineNumber; char text[2001]; char label[64]; // hold the asm label int ii = 0; // instruction number int ll = 0; // label number while (fgets(buffer, 999, file) != NULL) { // printf("%s", line); line = buffer; line[strlen(line) - 1] = '\0'; text[0] = '\0'; lineNumber = -1; // Trim leading space while (isspace((unsigned char)*line)) line++; // skip blank lines if (*line == 0) continue; // lines starting with # are comments // if (line[0] == '#') continue; // lines starting with # are 1 line comments // lines starting with #* are multiline comments (to *#) if (line[0] == '#') { if (strlen(line) == 1) continue; if (line[1] == '*') { line = line + 2; //printf("multiline comment! %s", line); // multiline comment #* ... need to search for next *# if(strstr(line, "*#") != NULL) { line = strstr(line, "*#") + 2; } else { // search next lines for *# while (fgets(buffer, 999, file) != NULL) { line = buffer; line[strlen(line) - 1] = '\0'; if (strstr(line, "*#") != NULL) { line = strstr(line, "*#") + 2; break; } } } } else continue; } sscanf(line, "%d: %2000[^\n]", &lineNumber, text); //debug: check that the arguments are getting parsed properly //printf("parsed - lineNumber=%d, text=%s \n", lineNumber, text); if (lineNumber == -1) { // if no line number, parse anyway sscanf(line, "%s", text); // only whitespace on line, so skip it if (text[0] == '\0') { continue; } sscanf(line, "%2000[^\n]", text); } // skip empty lines if (text[0] == '\0') { continue; } // handle assembler labels using the jumpTable // labels are lines ending in ':' if (text[strlen(text) - 1] == ':') { sscanf(text, "%64[^:]:", label); // printf("New Label '%s' at instruction %d \n", label, ii); table[ll].instructNumber = ii; strcpy(table[ll].text, label); ll++; // increment label number continue; } // increment instruction number ii++; } // while return ll; } // load a program from an assembler listing in a text file // uses compile() >> instructionFromText() or parseInstruction() // maybe change this to ...(struct Machine * mm, ...) ?? void loadAssembledProgram(struct Program * program, FILE * file) { clearProgram(program); // this is not robust, we need to deal with very long // instructions using malloc (eg "add ") char buffer[1000]; char * line; int lineNumber; char text[2001]; clock_t startTime = 0; clock_t endTime = 0; int ii = 0; // build a table of label line numbers // printf("Put %s%d%s labels into the jump table. \n", YELLOW, ll, NORMAL); // this returns the number of labels found buildJumpTable(program->labelTable, file); // printJumpTable(program->labelTable); rewind(file); //struct Instruction * instruction; while (fgets(buffer, 999, file) != NULL) { // printf("%s", line); line = buffer; line[strlen(line) - 1] = '\0'; text[0] = '\0'; lineNumber = -1; // Trim leading space while (isspace((unsigned char)*line)) line++; // skip blank lines if (*line == 0) continue; // lines starting with # are 1 line comments // lines starting with #* are multiline comments (to *#) if (line[0] == '#') { if (strlen(line) == 1) continue; if (line[1] == '*') { line = line + 2; //printf("multiline comment! %s", line); // multiline comment #* ... need to search for next *# if(strstr(line, "*#") != NULL) { line = strstr(line, "*#") + 2; } else { // search next lines for *# while (fgets(buffer, 999, file) != NULL) { line = buffer; line[strlen(line) - 1] = '\0'; if (strstr(line, "*#") != NULL) { line = strstr(line, "*#") + 2; break; } } } } else continue; } // skip label lines. improve this to trim spaces if (line[strlen(line)-1] == ':') continue; sscanf(line, "%d: %2000[^\n]", &lineNumber, text); //debug: check that the arguments are getting parsed properly //printf("parsed - lineNumber=%d, text=%s \n", lineNumber, text); if (lineNumber == -1) { // if no line number, parse anyway sscanf(line, "%s", text); // only whitespace on line, so skip it if (text[0] == '\0') { continue; } sscanf(line, "%2000[^\n]", text); } // empty line or just line number, just skip it if (*text == 0) continue; compile(program, text, ii, program->labelTable); ii++; } // while program->ip = 0; endTime = clock(); program->compileTime = (double)((endTime - startTime)*1000) / CLOCKS_PER_SEC; program->compileDate = time(NULL); /* printf("Compiled %s%d%s instructions from '%s%s%s' in about %s%ld%s milliseconds \n", CYAN, program->count, NORMAL, CYAN, program->source, NORMAL, CYAN, program->compileTime, NORMAL); */ } // check datatype of instruction etc int checkInstruction(struct Instruction * ii, FILE * file) { if ((info[ii->command].args > 0) && (ii->a.datatype == UNSET)) { fprintf(file, "error: missing argument for command \n"); } if ((commandType(ii->command) == JUMPS) && (ii->a.datatype != INT)) { //printf(file, "error: non integer \n"); } if ((info[ii->command].args == 0) && (ii->a.datatype != UNSET)) { fprintf(file, "warning: superfluous argument for command %s%s%s \n", YELLOW, info[ii->command].name, NORMAL); } switch (ii->command) { case ADD: if (ii->a.datatype != TEXT) { fprintf(file, "Error: ADD requires 'text' datatype \n"); } if (*ii->a.text == 0) { fprintf(file, "No text parameter for ADD \n"); } break; case CLIP: break; case CLOP: break; case CLEAR: break; case PRINT: break; case POP: break; case PUSH: break; case PUT: break; case GET: break; case SWAP: break; case INCREMENT: break; case DECREMENT: break; case READ: break; case UNTIL: case WHILE: case WHILENOT: break; case JUMP: // validate all jump targets... is target undefined, in // range of program etc etc. if (ii->a.datatype != INT) { fprintf(file, "error: Non integer target for jump instruction"); } break; case JUMPTRUE: if (ii->a.datatype != INT) { fprintf(file, "error: Non integer target for jump instruction"); } break; case JUMPFALSE: if (ii->a.datatype != INT) { fprintf(file, "error: Non integer target for jump instruction"); } break; case TESTIS: if (ii->a.datatype != TEXT) { fprintf(file, "wrong datatype for parameter for TESTIS \n"); } break; case TESTBEGINS: if (ii->a.datatype != TEXT) { fprintf(file, "wrong datatype for parameter for TESTBEGINS \n"); } break; case TESTENDS: if (ii->a.datatype != TEXT) { fprintf(file, "wrong datatype for parameter for TESTENDS \n"); } break; case TESTEOF: break; case TESTTAPE: break; case COUNT: break; case INCC: break; case DECC: break; case ZERO: break; case CHARS: break; case STATE: break; case QUIT: break; case WRITE: break; case NOP: break; case UNDEFINED: break; default: break; } // switch return -1; } // checkInstruction /* check whether an instruction has the right sort of parameters etc. It is better to do this once before a program is executed rather than in the "execute" routine. Returns positive integer if the instruction is not valid s should be "validateProgram" in order to check jump targets etc. And also to make sure jumptrue, jumpfalse have a test before them. check for infinite loops. Check if there is a branch zero at the end. Check if there is a read at the begining */ int validateProgram(struct Program * program) { struct Instruction * instruction; int ii; int warnings = 0; int errors = 0; for (ii = 0; ii < program->count; ii++) { instruction = &program->listing[ii]; if (ii == program->count-1) { if ((instruction->command != JUMP) || (instruction->a.datatype != INT) || (instruction->a.number != 0)) { numberedInstruction(instruction, ii, stdout); fprintf(stdout, "Normally the last instruction of the program \n" "should be an unconditional jump to the first instruction \n" "eg: jump 0 \n" "This is because the parse machine is designed to run in a \n" "loop for each character read (somewhat similar to sed, except \n" "for characters, not lines \n"); warnings++; } } if (ii == 0) { if (instruction->command != READ) { numberedInstruction(instruction, ii, stdout); fprintf(stdout, "Normally the first instruction of the program \n" "should be a READ, which reads one character from the \n" "input source \n"); warnings++; } } } // for loop printf( "Checked %s%d%s instructions and found %s%d%s errors and %s%d%s warnings \n", BLUE, ii, NORMAL, YELLOW, errors, NORMAL, YELLOW, warnings, NORMAL); return 0; } // the virtual machine which parses struct Machine { FILE * inputstream; //source of characters int peep; // next char in the stream, may have EOF struct Buffer buffer; //workspace & stack struct Tape tape; int accumulator; // used for counting long charsRead; // how many characters read from input stream long lines; // how many lines already read from input stream enum Bool flag; // used for tests char delimiter; // eg *, to separate tokens on the stack char escape; // escape character struct Program program; // compiled instructions + ip counter char version[64]; // machine version string (eg: "0.1 campania" etc) }; // all functions relating to the machine // initialise the machine with an input stream void newMachine(struct Machine * machine, FILE * input, int tapeCells, int cellSize) { // set the input stream // read the first character into peep?? machine->inputstream = input; machine->charsRead = 0; machine->lines = 0; machine->accumulator = 0; newTape(&machine->tape, tapeCells, cellSize); newBuffer(&machine->buffer, 40); newProgram(&machine->program, 1000); machine->peep = fgetc(machine->inputstream); machine->flag = FALSE; machine->escape = '\\'; machine->delimiter = '*'; strcpy(machine->version, "0.1 campania"); } // reset the machine without destroying program void resetMachine(struct Machine * machine) { // rewind the input stream rewind(machine->inputstream); machine->charsRead = 0; machine->lines = 0; machine->accumulator = 0; clearTape(&machine->tape); clearBuffer(&machine->buffer); // read the first character into peep machine->peep = fgetc(machine->inputstream); machine->flag = FALSE; machine->escape = '\\'; machine->delimiter = '*'; machine->program.ip = 0; } // free all memory associated with the machine. void freeMachine(struct Machine * mm) { freeTape(&mm->tape); freeProgram(&mm->program); free(mm->buffer.stack); return; } // read one character from the input stream and update // the peep and workspace and other machine registers // this fuction is used in commands while, whilenot, until, read etc // returns 0 if end of stream is reached int readc(struct Machine * mm) { if (mm->peep == EOF) return 1; if (feof(mm->inputstream)) return 2; char * lastc = mm->buffer.workspace + strlen(mm->buffer.workspace); if (strlen(mm->buffer.stack) >= mm->buffer.capacity) growBuffer(&mm->buffer, 50); *lastc = mm->peep; *(lastc + 1) = '\0'; mm->peep = fgetc(mm->inputstream); mm->charsRead++; if (mm->peep == '\n') { mm->lines++; } return 0; } /* display some meta-information about the machine such as version, capacites etc */ void printMachineMeta(struct Machine * machine) { char Colour[30]; char date[30]; char time[30]; strcpy(Colour, BROWN); strcpy(date, "?"); strcpy(time, "?"); printf("%s Machine Version:%s %s \n", Colour, NORMAL, machine->version); printf("%s Character read:%s %ld \n", Colour, NORMAL, machine->charsRead); printf("%s Escape character:%s %c \n", Colour, NORMAL, machine->escape); printf("%s Peep character: %c %s \n", Colour, machine->peep, NORMAL); // cant really use escapeSpecial here, because it converts text, not // one character... // escapeSpecial(, PINK); printf("\n"); printf("%s Stack token delimiter:%s %c \n", Colour, NORMAL, machine->delimiter); /* printf("%s Flag:%s %d \n", Colour, NORMAL, machine->flag); if (machine->flag == FALSE) { //strcpy(date, ctime(&machine->compileDate)); } */ } void printBufferAndPeep(struct Machine * mm) { printf("[%ld] Buff:%s Peep:%c \n", mm->buffer.capacity, mm->buffer.stack, mm->peep ); } void showBufferAndPeep(struct Machine * mm) { char peep[20]; if (mm->peep == EOF) { strcpy(peep, "EOF"); } else if (mm->peep == '\n') { strcpy(peep, "\\n"); } else if (mm->peep == '\r') { strcpy(peep, "\\r"); } else { peep[0] = mm->peep; peep[1] = '\0'; } printf("[%s%ld%s] Buff[%s%s%s] P[%s%s%s] \n", GREEN, mm->buffer.capacity, NORMAL, YELLOW, mm->buffer.stack, NORMAL, BLUE, peep, NORMAL ); } /* show core registers of the parsing machine: the stack, the "workspace" (like a text accumulator), and the "peep" (the next character in the input stream */ void showStackWorkPeep(struct Machine * mm, enum Bool escape) { char peep[20]; if (mm->peep == EOF) { strcpy(peep, "EOF"); } else if (mm->peep == '\n') { strcpy(peep, "\\n"); } else if (mm->peep == '\r') { strcpy(peep, "\\r"); } else { peep[0] = mm->peep; peep[1] = '\0'; } // uses the %.*s, len, buffer trick to print the stack int stackwidth = mm->buffer.workspace - mm->buffer.stack; if (escape == TRUE) { printf("%s(Buff:%s%ld%s)%s Stack[%s%.*s%s] Work[%s", WHITE, BROWN, mm->buffer.capacity, WHITE, NORMAL, YELLOW, stackwidth, mm->buffer.stack, NORMAL, PURPLE); escapeSpecial(mm->buffer.workspace, CYAN); printf("%s] Peep[%s%s%s] \n", NORMAL, BLUE, peep, NORMAL ); } else { // uses the %.*s, len, buffer trick to print the stack printf("%s(Buff:%s%ld%s)%s Stack[%s%.*s%s] Work[%s%s%s] Peep[%s%s%s] \n", WHITE, BROWN, mm->buffer.capacity, WHITE, NORMAL, YELLOW, stackwidth, mm->buffer.stack, NORMAL, PURPLE, mm->buffer.workspace, NORMAL, BLUE, peep, NORMAL ); } } /* display the state of registers of the machine using colours This does not show the tape (which is part of the machine) or the program which is also part of the machine. See showMachineTapeProgram() or showMachineWithTape() for that. The "escape" parameter determines if whitespace (newlines, carriage returns, tabs etc) will be displayed in the format "\n \r \t" etc, or just printed normally. If there are newlines in the workspace then the display gets messy. So its just a matter of aesthetics. */ void showMachine(struct Machine * mm, enum Bool escape) { showStackWorkPeep(mm, escape); printf("Acc:%s%d%s Flag:%s%s%s Esc:%s%c%s " "Delim:%s%c%s Chars:%s%ld%s Lines:%s%ld%s \n", GREEN, mm->accumulator, NORMAL, YELLOW, (mm->flag==0?"TRUE":"FALSE"), NORMAL, YELLOW, mm->escape, NORMAL, YELLOW, mm->delimiter, NORMAL, BLUE, mm->charsRead, NORMAL, BLUE, mm->lines, NORMAL); } // show state of machine buffers with tape cells void showMachineWithTape(struct Machine * mm) { printf("%s--------- Machine State -----------%s \n", BROWN, NORMAL); showMachine(mm, TRUE); printf("%s--------- Tape --------------------%s \n", BROWN, NORMAL); printSomeTape(&mm->tape, TRUE); } // show state of machine buffers with tape cells void showMachineTapeProgram(struct Machine * mm) { printSomeProgram(&mm->program, 3); printf("%s--------- Machine State -----------%s \n", PURPLE, NORMAL); showMachine(mm, TRUE); printf("%s--------- Tape --------------------%s \n", PURPLE, NORMAL); printSomeTape(&mm->tape, TRUE); } /* execute a compiled instruction. Possible return values might be 0: success no problems 1: end of stream reached (tried to read eof) 2: trying to execute undefined instruction 3: quit/crash command executed (exit script) 4: could not open 'sav.pp' for writing (from write command) 5: tried to execute unimplemented command */ int execute(struct Machine * mm, struct Instruction * ii) { struct TapeCell * thisCell; long newCapacity; FILE * saveFile; // where workspace is written by 'write' command char * temp; // a temporary string for x swaps char acc[100]; // a text version of the accumulator char * buffer; // store the workspace when escaping (needs to be malloc) size_t len; int count; // count escapable chars for malloc char * lastc; // points to last char in workspace char * lastw; // points to last char in workspace int (* fn)(int); // a function pointer for the ctype.h functions switch (ii->command) { case ADD: if (strlen(mm->buffer.stack) + strlen(ii->a.text) > mm->buffer.capacity) { growBuffer(&mm->buffer, strlen(ii->a.text) + 50); } strcat(mm->buffer.workspace, ii->a.text); break; case CLIP: if (*mm->buffer.workspace == 0) break; mm->buffer.workspace[strlen(mm->buffer.workspace)-1] = '\0'; break; case CLOP: if (*mm->buffer.workspace == 0) break; len = strlen(mm->buffer.workspace); memmove(mm->buffer.workspace, mm->buffer.workspace+1, len-1); mm->buffer.workspace[len-1] = 0; break; case CLEAR: mm->buffer.workspace[0] = '\0'; break; case PRINT: printf("%s", mm->buffer.workspace); break; case POP: // pop a token from the stack, so skip the first delim * and // read back to the next delim * in the stack buffer. // // this pop routine seems unnecessary complicated. // basically given s:a*b* w: // pop should give s:a* w:b* // if (mm->buffer.workspace == mm->buffer.stack) break; mm->buffer.workspace--; if (mm->buffer.workspace == mm->buffer.stack) { if (mm->tape.currentCell > 0) mm->tape.currentCell--; break; } while ((*(mm->buffer.workspace-1) != mm->delimiter) && (mm->buffer.workspace-1 != mm->buffer.stack)) mm->buffer.workspace--; if (mm->buffer.workspace == mm->buffer.stack) { if (mm->tape.currentCell > 0) mm->tape.currentCell--; break; } if (*(mm->buffer.workspace-1) != mm->delimiter) mm->buffer.workspace--; // dec current tape cell if (mm->tape.currentCell > 0) mm->tape.currentCell--; break; case PUSH: // could use strchr instead, or better strchrnul // but strchrnul is not standard C. if (mm->buffer.workspace[0] == '\0') break; while ((*mm->buffer.workspace != '\0') && (*mm->buffer.workspace != mm->delimiter)) mm->buffer.workspace++; if (mm->buffer.workspace[0] == mm->delimiter) mm->buffer.workspace++; // increment current tape cell // star at end of token - not beginning // look for limits !! if (mm->tape.currentCell < mm->tape.capacity) mm->tape.currentCell++; else { printf("Push: Out of tape bounds"); exit(1); } break; case PUT: // I could make this a function, but the only place the // tape cell can get resized is here in the PUT command // if not enough space in tape cell, malloc here thisCell = &mm->tape.cells[mm->tape.currentCell]; if (strlen(mm->buffer.workspace) > thisCell->capacity) { newCapacity = strlen(mm->buffer.workspace)+100; thisCell->text = malloc(newCapacity * sizeof(char)); if (thisCell->text == NULL) { fprintf(stderr, "PUT: couldnt allocate memory for cell->text (execute) \n"); exit(EXIT_FAILURE); } thisCell->capacity = newCapacity - 1; thisCell->resizings++; } strcpy(thisCell->text, mm->buffer.workspace); break; case GET: if (strlen(mm->buffer.stack) + strlen(mm->tape.cells[mm->tape.currentCell].text) > mm->buffer.capacity) { growBuffer(&mm->buffer, strlen(ii->a.text) + 50); } strcat(mm->buffer.workspace, mm->tape.cells[mm->tape.currentCell].text); break; case SWAP: temp = malloc(strlen(mm->buffer.workspace + 10) * sizeof(char)); // ... todo! implement this printf("Swap: Not implemented!!! (yet) \n"); return 5; // unimplemented break; case INCREMENT: if (mm->tape.currentCell < mm->tape.capacity) mm->tape.currentCell++; else { printf("++: Out of tape bounds"); exit(1); } break; case DECREMENT: if (mm->tape.currentCell > 0) mm->tape.currentCell--; break; case READ: if (mm->peep == EOF) { return 1; // end of file } readc(mm); break; case UNTIL: // until workspace ends with ii->a.text // this seems to be working. if (mm->peep == EOF) break; if (!readc(mm)) break; len = strlen(ii->a.text); size_t worklen = strlen(mm->buffer.workspace); // below is a general "endswith" function... // strcmp(mm->buffer.workspace+worklen-len, ii->a.text) != 0) { char * suffix = mm->buffer.workspace+worklen-len; //while (strcmp(mm->buffer.workspace+worklen-len, ii->a.text) != 0) { while (readc(mm)) { // zero false, non-zero true. So if strings are equal stop looping // and no escape of first char in suffix worklen++; suffix = mm->buffer.workspace+worklen-len; // if ((strcmp(suffix, ii->a.text) == 0) && (*(suffix-1) != '\\')) { // deal with escape character. if ((strcmp(suffix, ii->a.text) == 0) && (*(suffix-1) != mm->escape)) { // printf("%s\n", suffix); break; } } break; case WHILE: if (mm->peep == EOF) break; // while and whilenot handle classes eg :space: ranges eg [a-z] // and lists eg [abxy] [.] etc // a function pointer: fn = &isblank; int res = (*fn)('1'); if (ii->a.datatype == CLASS) { // get character class function pointer from the instruction fn = ii->a.classFn; while ((*fn)(mm->peep)) { if (!readc(mm)) break; } } else if (ii->a.datatype == RANGE) { // compare peep to a range of characters eg a-z while ((mm->peep >= ii->a.range[0]) && (mm->peep <= ii->a.range[1])) { if (!readc(mm)) break; } } else if (ii->a.datatype == LIST) { // read input while peep is in a list of chars while (strchr(ii->a.list, mm->peep) != NULL) { if (!readc(mm)) break; } } break; case WHILENOT: // why not use a switch here??? Its a switch within a switch... /* switch (ii->a.datatype) { case CLASS: break; case RANGE: break; } */ if (ii->a.datatype == CLASS) { // get character class function pointer from the instruction fn = ii->a.classFn; while (!(*fn)(mm->peep)) { if (!readc(mm)) break; } } else if (ii->a.datatype == RANGE) { // read input while peep is not in a range (eg b-f) while ((mm->peep < ii->a.range[0]) || (mm->peep > ii->a.range[1])) { if (!readc(mm)) break; } } else if (ii->a.datatype == LIST) { // read input while peep is not in a char list eg "abxy" while (strchr(ii->a.list, mm->peep) == NULL) { if (!readc(mm)) break; } } break; case JUMP: // update program counter. non-relative jump mm->program.ip = ii->a.number; break; case JUMPTRUE: if (mm->flag == TRUE) // relative jump. easier to assemble code // mm->program.ip = ii->a.number; mm->program.ip = mm->program.ip + ii->a.number; else mm->program.ip++; break; case JUMPFALSE: if (mm->flag == FALSE) // relative jump. mm->program.ip = mm->program.ip + ii->a.number; else mm->program.ip++; break; case TESTIS: if (strcmp(mm->buffer.workspace, ii->a.text) == 0) { mm->flag = TRUE; } else { mm->flag = FALSE; } break; case TESTCLASS: // handle class, range, text, etc // depending on the parameter type. if (ii->a.datatype == CLASS) { // get character class function pointer from the instruction fn = ii->a.classFn; lastc = mm->buffer.workspace; //while ((*fn)(mm->peep)) { mm->flag = TRUE; while (*lastc != 0) { if (!fn(*lastc)) { mm->flag = FALSE; break; } lastc++; } } else if (ii->a.datatype == RANGE) { // compare ws to a range of characters eg a-z mm->flag = TRUE; lastc = mm->buffer.workspace; while (*lastc != 0) { if ((*lastc <= ii->a.range[0]) || (*lastc >= ii->a.range[1])) { mm->flag = FALSE; break; } lastc++; } } else if (ii->a.datatype == LIST) { // compare ws to a list of chars //while (strchr(ii->a.list, mm->peep) != NULL) { mm->flag = TRUE; lastc = mm->buffer.workspace; while (*lastc != 0) { if (strchr(ii->a.list, *lastc) == NULL) { mm->flag = FALSE; break; } lastc++; } } break; case TESTBEGINS: if (strncmp(mm->buffer.workspace, ii->a.text, strlen(ii->a.text)) == 0) { mm->flag = TRUE; } else { mm->flag = FALSE; } break; case TESTENDS: if (strcmp(mm->buffer.workspace + strlen(mm->buffer.workspace) - strlen(ii->a.text), ii->a.text) == 0) mm->flag = TRUE; else mm->flag = FALSE; break; case TESTEOF: if (mm->peep == EOF) { mm->flag = TRUE; } else { mm->flag = FALSE; } break; case TESTTAPE: if (strcmp(mm->buffer.workspace, mm->tape.cells[mm->tape.currentCell].text) == 0) { mm->flag = TRUE; } else { mm->flag = FALSE; } break; case COUNT: sprintf(acc, "%d", mm->accumulator); if (strlen(mm->buffer.stack) + strlen(acc) > mm->buffer.capacity) { growBuffer(&mm->buffer, strlen(acc) + 50); } strcat(mm->buffer.workspace, acc); break; case INCC: mm->accumulator++; break; case DECC: mm->accumulator--; break; case ZERO: mm->accumulator = 0; break; case CHARS: sprintf(acc, "%ld", mm->charsRead); if (strlen(mm->buffer.stack) + strlen(acc) > mm->buffer.capacity) { growBuffer(&mm->buffer, strlen(acc) + 50); } strcat(mm->buffer.workspace, acc); break; case LINES: sprintf(acc, "%ld", mm->lines); if (strlen(mm->buffer.stack) + strlen(acc) > mm->buffer.capacity) { growBuffer(&mm->buffer, strlen(acc) + 50); } strcat(mm->buffer.workspace, acc); break; case ESCAPE: // count escapable // but if the escapable is already preceded with a // backslash should we reescape it count=0; lastc = strchr(mm->buffer.workspace, ii->a.text[0]); while (lastc != NULL) { count++; lastc = strchr(lastc+1, ii->a.text[0]); } buffer = malloc((strlen(mm->buffer.workspace)+count+10) * sizeof(char)); // also grow workspace if needed. strcpy(buffer, mm->buffer.workspace); lastw = mm->buffer.workspace; while (*buffer != 0) { if (*buffer == ii->a.text[0]) { *lastw = mm->escape; lastw++; } *lastw = *buffer; buffer++; lastw++; } *lastw = 0; break; case UNESCAPE: /* how should this work? should unescape deescape all backlash letters or only one or a list, or a class :blank: or a range [a-z] */ lastc = mm->buffer.workspace; lastw = mm->buffer.workspace; while (*lastc != 0) { if ((*lastc == mm->escape) && (*(lastc+1)==ii->a.text[0])) { lastc++; } *lastw = *lastc; lastc++; lastw++; } *lastw = 0; // ... break; case STATE: showMachineTapeProgram(mm); break; case QUIT: // return 3; // script must now exit break; case WRITE: // write workspace to file 'sav.pp' if ((saveFile = fopen("sav.pp", "w")) == NULL) { printf ("Cannot open file %s'sav.pp'%s \n", YELLOW, NORMAL); return 4; } fputs(mm->buffer.workspace, saveFile); fclose(saveFile); break; case NOP: break; case UNDEFINED: fprintf(stderr, "Executing undefined command! " "at instruction: %d \n ", mm->program.ip); return 2; // error code break; } // if a jump command, dont increment the instruction pointer if (strncmp(info[ii->command].name, "jump", 4) != 0) { mm->program.ip++; } return 0; } // steps through one instruction of the machines program void step(struct Machine * mm) { //int result = execute(mm, &mm->program.listing[mm->program.ip]); } /* runs the compiled program in the machine but this will exit when the last read is performed... execute() has the following exit codes which we might need to handle: 0: success no problems 1: end of stream reached (tried to read eof) 2: trying to execute undefined instruction 3: quit/crash command executed (exit script) */ void run(struct Machine * mm) { int result; for (;;) { result = execute(mm, &mm->program.listing[mm->program.ip]); if (result != 0) break; } } void runDebug(struct Machine * mm) { int result; long ii = 0; // a counter for (;;) { printf("%6ld: ip=%3d T(n)=%3d", ii, mm->program.ip, mm->tape.currentCell); printInstruction(&mm->program.listing[mm->program.ip]); result = execute(mm, &mm->program.listing[mm->program.ip]); printf("\n"); if (ii == 3473) { // showMachineTapeProgram(mm); } if (result != 0) { printf("execute() returned error code (%d)\n", result); break; } ii++; } } // runs the compiled program in the machine until the // flag register is set to true void runUntilTrue(struct Machine * mm) { int result; while (mm->flag == FALSE) { result = execute(mm, &mm->program.listing[mm->program.ip]); if (result != 0) break; } } /* runs the compiled program in the machine until the workspace is exactly the specified text */ void runUntilWorkspaceIs(struct Machine * mm, char * text) { int result; while ((mm->peep != EOF) && (strcmp(mm->buffer.workspace, text) != 0)) { result = execute(mm, &mm->program.listing[mm->program.ip]); if (result != 0) break; } } // given some instruction text (eg: add "this") compile an instruction into // the machines program listing. Returns zero on success or a positive integer // if arguments are invalid. maybe this should return a pointer to an // instruction upon success the job of compile is to housekeep the machine. // the job of instructionFromText() is actually to parse the text into an // instruction // change this to struct Program * pp ?? or change // loadAssembledProgram to take a machine int compile(struct Program * program, char * text, int pos, struct Label table[]) { struct Instruction * ii; char command[200]; char args[1000]; // pos is where in the program list to put the compiled instruction ii = &program->listing[pos]; if (program->capacity == program->count + 1) { growProgram(program, 10); } // int result = sscanf(text, "%200s %200[^\n]", command, args); enum Command com = textToCommand(command); if (com == UNDEFINED) { printf("Unknown command name %s%s%s \n", BROWN, command, NORMAL); return 1; } instructionFromText(stdout, ii, text, pos, table); program->count++; return 0; // 2 argument compilation... to do } // compile // commands to test and analyse the machine // these enumerations are in the same order as the informational // array below for convenience enum TestCommand { HELP=0, COMMANDHELP, SEARCHHELP, SEARCHCOMMAND, LISTMACHINECOMMANDS, DESCRIBEMACHINECOMMANDS, LISTCLASSES, LISTCOLOURS, MACHINEPROGRAM, MACHINESTATE, MACHINETAPESTATE, MACHINEMETA, BUFFERSTATE, STACKSTATE, TAPESTATE, TAPEINFO, RESETINPUT, RESETMACHINE, STEPMODE, PROGRAMMODE, MACHINEMODE, COMPILEMODE, IPCOMPILEMODE, ENDCOMPILEMODE, INTERPRETMODE, LISTPROGRAM, LISTSOMEPROGRAM, LISTPROGRAMWITHLABELS, PROGRAMMETA, SAVEPROGRAM, SHOWJUMPTABLE, LOADPROGRAM, LOADASM, LOADLAST, LOADSAVED, SAVELAST, CHECKPROGRAM, CLEARPROGRAM, CLEARLAST, INSERTINSTRUCTION, EXECUTEINSTRUCTION, PARSEINSTRUCTION, TESTWRITEINSTRUCTION, STEPCODE, RUNCODE, RUNZERO, RUNTOTRUE, RUNTOWORK, IPZERO, IPEND, IPGO, IPPLUS, IPMINUS, SHOWSTREAM, EXIT, UNKNOWN }; // stepcode and executeinstruction below seem to be the // same exactly struct { enum TestCommand c; char * names[2]; char * argText; // eg char * description; } testInfo[] = { { HELP, {"hh", ""}, "", "Show this help. List interpreter commands" }, { COMMANDHELP, {"H", ""}, "", "show help for a particular machine command" }, { SEARCHHELP, {"/", "h/"}, "", "searches help system for an interpreter command containing search term" }, { SEARCHCOMMAND, {"//", "//"}, "", "searches help for a machine command containing search term" }, { LISTMACHINECOMMANDS, {"com", ""}, "", "list all machine commands" }, { DESCRIBEMACHINECOMMANDS, {"Com", ""}, "", "list and describe machine commands" }, { LISTCLASSES, {"class", "cl"}, "", "list all valid character classes for testclass and while" }, { LISTCOLOURS, {"col", "colours"}, "", "list all ansi colours" }, { MACHINEPROGRAM, {"m", ""}, "", "show state of machine, tape and current program instructions" }, { MACHINESTATE, {"M", ""}, "", "show the state of the machine" }, { MACHINETAPESTATE, {"s", ""}, "", "show state of the machine buffers with some tape cells" }, { MACHINEMETA, {"Mm", ""}, "", "show some meta information about the machine" }, { BUFFERSTATE, {"bu", ""}, "", "show the state of the machine buffer (stack/workspace)" }, { STACKSTATE, {"S", "stack"}, "", "show the state of the machine stack" }, { TAPESTATE, {"T", "t.."}, "", "show the state of the machine tape" }, { TAPEINFO, {"TT", "tapeinfo"}, "", "show detailed info of the state of the machine tape" }, { RESETINPUT, {"i.r", ""}, "", "reset the input stream" }, { RESETMACHINE, {"M.r", ""}, "", "reset the machine to original state" }, { STEPMODE, {"m.s", ""}, "", "make step through instructions" }, { PROGRAMMODE, {"m.p", ""}, "", "make display program state" }, { MACHINEMODE, {"m.m", ""}, "", "make display machine state" }, { COMPILEMODE, {"m.c", ""}, "", "entered instructions are compiled but not executed" }, { IPCOMPILEMODE, {"m.ipc", ""}, "", "entered instructions are compiled at current ip position" }, { ENDCOMPILEMODE, {"m.ec", ""}, "", "entered instructions are compiled at end of program" }, { INTERPRETMODE, {"m.int", "interp"}, "", "entered instructions are executed but not compiled" }, { LISTPROGRAM, {"ls", "p.ls"}, "", "list all instructions in the machines current program" }, { LISTSOMEPROGRAM, {"l", "p."}, "", "list current instructions in the machines program" }, { LISTPROGRAMWITHLABELS, {"pl", "p.ll"}, "", "list all instructions in program with labels (and jump labels)" }, { PROGRAMMETA, {"pm", "pi"}, "", "show some meta information about the current program" }, { SAVEPROGRAM, {"p.wa", "p.w"}, "", "save the current program as 'assembler'" }, { SHOWJUMPTABLE, {"jj", "showjumps"}, "", "Show the jumptable generated by buildJumpTable()" }, { LOADPROGRAM, {"l.asm", "l.a"}, "", "load machine assembler commands from text file" }, { LOADASM, {"asm", "as"}, "", "load 'asm.pp' (the script parser) and reset the machine" }, { LOADLAST, {"last", "p.ll"}, "", "load 'last.pp' (the program automatically saved on exit)" }, { LOADSAVED, {"sav", "l.sav"}, "", "load 'sav.pp' (output of the 'write' command.)" }, { SAVELAST, {"ww", "p.ww"}, "", "save 'last.pp' (the program automatically saved on exit)" }, { CHECKPROGRAM, {"p.v", ""}, "", "validate or check the machines compiled program " }, { CLEARPROGRAM, {"p.dd", "dd"}, "", "delete the machines compiled program " }, { CLEARLAST, {"p.dl", "pdl"}, "", "delete the last instruction in the compiled program " }, { INSERTINSTRUCTION, {"pi", "p.i"}, "", "insert an instruction at the current program ip " }, { EXECUTEINSTRUCTION, {"n", "."}, "", "execute the next (current) compiled instruction in program" }, { PARSEINSTRUCTION, {"pi:", "pi"}, "", "parse some example text into a compiled instruction" }, { TESTWRITEINSTRUCTION, {"twi", "twi"}, "", "shows how the current instruction will be written by writeInstruction()" }, { STEPCODE, {"p.s", "ps"}, "", "step through the next instruction in compiled program" }, { RUNCODE, {"rr", "p.r"}, "", "run the whole compiled program from the current instruction" }, { RUNZERO, {"r0", "p.r0"}, "", "run the whole compiled program from instruction zero" }, { RUNTOTRUE, {"rrt", "p.rt"}, "", "run the whole compiled program until flag is set to true" }, { RUNTOWORK, {"rrw", "runwork"}, "", "run program until the workspace is exactly the given text" }, { IPZERO, {"p<<", "p0"}, "", "set the instruction pointer to zero" }, { IPEND, {"p>>", "p.e"}, "", "set the instruction pointer to the end of the program" }, { IPGO, {"pg", "p.g"}, "", "set the instruction pointer to the given instruction" }, { IPPLUS, {"p>", "p.>"}, "", "increment the instruction pointer without executing" }, { IPMINUS, {"p<", "p.<"}, "", "decrement the instruction pointer without executing" }, { SHOWSTREAM, {"ss", "ss"}, "", "shows the next few characters from the input stream" }, { EXIT, {"X", "exit"}, "", "exit the maching testing program" }, { UNKNOWN, {"", ""}, "", ""} }; // given user input text return the test command enum TestCommand textToTestCommand(const char * text) { int ii; if (*text == 0) return UNKNOWN; for (ii = 0; ii < UNKNOWN; ii++) { if ((strcmp(text, testInfo[ii].names[0]) == 0) || (strcmp(text, testInfo[ii].names[1]) == 0)) { return (enum TestCommand)ii; } } return UNKNOWN; } void showTestHelp() { int ii; int key; printf("%s[ALL HELP COMMANDS]%s:\n", YELLOW, NORMAL); for (ii = 0; ii < UNKNOWN; ii++) { printf("%s%5s%s %s %s-%s %s%s \n", GREEN, testInfo[ii].names[0], PALEGREEN, testInfo[ii].argText, NORMAL, WHITE, testInfo[ii].description, NORMAL); if ( (ii+1) % 14 == 0 ) { pagePrompt(); key = getc(stdin); if (key == 'q') { return; } } } } // information about different modes the test program can be in // eg step where enter steps through the next instruction. // Or maybe 2 mode variables enterMode and programMode. enterMode // determines what the key does, and programMode determines // whether instructions are compiled or executed or both enum TestMode { COMPILE=0, IPCOMPILE, ENDCOMPILE, INTERPRET, MACHINE, PROGRAM, STEP, RUN } mode; // contain information about all commands struct { enum TestMode mode; char * name; char * description; } modeInfo[] = { { COMPILE, "compile", "enter displays the compiled program. Instructions entered are \n" "compiled into the machines program listing, but are not executed "}, { IPCOMPILE, "ipcompile", "Instructions are compiled into the machines program listing \n" "at the current instruction pointer position instead of at \n" "the end of the program. This is a slightly clunky way of \n" "modifying the in-memory program, but possibly easier than trying \n" "to manually modify the text assembler listing. "}, { ENDCOMPILE, "endcompile", "Instructions are compiled into the machines program listing \n" "at the end of the program \n" }, { INTERPRET, "interpret", "Instructions entered are executed but not compiled into the \n" "machine's program. \n" }, { MACHINE, "machine", "enter displays the state of the machine"}, { PROGRAM, "", ""}, { STEP, "step", "When enter is pressed the next instruction is executed and " "the state of the machine is displayed "}, { RUN, "run", "When enter is pressed the current program is run. "} }; // searches testInfo array for commands matching a search term void searchHelp(char * text) { int ii; int jj = 0; char key; printf("%s[Searching help commands]%s:\n", YELLOW, NORMAL); for (ii = 0; ii < UNKNOWN; ii++) { if ((strstr(testInfo[ii].description, text) != NULL) || (strstr(testInfo[ii].names[1], text) != NULL) || (strstr(testInfo[ii].names[0], text) != NULL)) { jj++; printf("%s%5s%s %s %s-%s %s%s \n", PURPLE, testInfo[ii].names[0], YELLOW, testInfo[ii].argText, NORMAL, BROWN, testInfo[ii].description, NORMAL); if ( (jj+1) % 14 == 0 ) { pagePrompt(); key = getc(stdin); if (key == 'q') { return; } } } // if search term found } // for if (jj == 0) { printf("No results found for '%s'\n", text); } } void printUsageHelp() { printf("'pp' a pattern parsing machine \n"); printf("Usage: pp [-shI] [-f script-file] [-a file] [-i file] \n" " \n" " -i input-file \n" " text input file \n" " -f script-file \n" " text input file \n" " -h \n" " print this help \n" " -s \n" " run in unix filter mode \n" " -a [file] \n" " load script-assembly source file \n" " -I \n" " run in interactive mode (with shell) \n"); } // The main testing loop for the machine. This accepts interactive // commands and executes them, among other things. Allows the state // of the machine to be observed, including the compiled program. int main (int argc, char *argv[]) { int c; // a file containing script-assembler to be compiled and run char * asmFileName = NULL; char * inputFile = NULL; char * scriptFileName = NULL; char source[64] = "input.txt"; fflush(stdout); // just checking stdin and stdoutjfk printf("stdin!"); while (!feof(stdin)) { if (ferror(stdin)) { return 1; } fputc(tolower(fgetc(stdin)), stdout); } exit(0); /* filtermode means the program is used as a unix filter, like sed, grep etc intermode means the program starts an interactive shell, so that the user can debug scripts and learn about the machine */ enum { FILTERMODE, INTERMODE } progmode = INTERMODE; opterr = 0; while ((c = getopt (argc, argv, "i:f:Isha:")) != -1) { switch (c) { // a text input file, but this should be a non-option argument // (just the file name). case 'i': inputFile = optarg; break; case 'f': scriptFileName = optarg; break; case 'I': progmode = INTERMODE; break; break; case 's': progmode = FILTERMODE; break; break; case 'h': printUsageHelp(); break; case 'a': asmFileName = optarg; break; case '?': if (optopt == 'a') { fprintf (stderr, "Option -%c requires an argument.\n", optopt); fprintf (stderr, " -a scriptAssemblyFile \n"); } else if (optopt == 'i') { fprintf (stderr, "Option -%c requires an argument.\n", optopt); fprintf (stderr, " -i inputFile \n"); } else if (optopt == 'f') { fprintf (stderr, "%s: option -%c requires an argument.\n", argv[0], optopt); fprintf (stderr, " -i scriptFile \n"); } else if (isprint (optopt)) { fprintf (stderr, "%s: unknown option -- %c \n", argv[0], optopt); printUsageHelp(); } else { printUsageHelp(); fprintf (stderr, "%s: unknown option char -- '\\x%x' ", argv[0], optopt); } return 1; default: abort (); } } // while /* for (index = optind; index < argc; index++) printf ("Non-option argument %s\n", argv[index]); return 0; */ if ((asmFileName != NULL) && (scriptFileName != NULL)) { printf("cannot load assembly and script at the same time (-a and -f)"); printUsageHelp(); exit(1); } if (inputFile != NULL) { strcpy(source, inputFile); } char version[64] = "v31415"; if (progmode != FILTERMODE) { banner() ; printf("Compiled: %s%s%s \n", YELLOW, version, NORMAL); } struct Machine m; //struct Instruction i; // mode determines how the enter key behaves. Also, in compile mode // instructions are compiled into the machines program but not automatically // executed enum TestMode mode = INTERPRET; enum TestCommand testCom; FILE * fp; FILE * inputStream; if((inputStream = fopen (source, "r")) == NULL) { printf ("Cannot open file %s%s%s \n", YELLOW, source, NORMAL); printf ("Try %s%s -i %s \n", CYAN, argv[0], NORMAL); exit (1); } //else { fp = stdin; } if (progmode != FILTERMODE) { printf("Using source file %s%s%s as input stream \n", YELLOW, source, NORMAL); printf("Type %shh%s for a list of commands\n", YELLOW, NORMAL); } FILE * asmFile; FILE * scriptFile; // machine, input stream, tape cells, tape cells size, and program listing newMachine(&m, inputStream, 100, 10); if (scriptFileName != NULL) { if ((scriptFile = fopen(scriptFileName, "r")) == NULL) { printf("Could not open script file %s%s%s \n", YELLOW, scriptFileName, NORMAL); exit(1); } if (progmode != FILTERMODE) { printf("\n"); printf("Loading script file %s%s%s \n", YELLOW, scriptFileName, NORMAL); } /* the procedure is: load asm.pp, run it on the script-file (assembly is saved to sav.pp) load sav.pp, run it on the input stream. */ if ((asmFile = fopen("asm.pp", "r")) == NULL) { printf("Could not open assembler %sasm.pp%s \n", YELLOW, NORMAL); exit(1); } if (progmode != FILTERMODE) { printf("Loading assembler %sasm.pp%s \n", YELLOW, NORMAL); } strcpy(m.program.source, "asm.pp"); loadAssembledProgram(&m.program, asmFile); //m.inputstream = scriptFile; // this run is crashing the system. no feof() check? run(&m); //runDebug(&m); fclose(asmFile); // now open sav.pp as "asmFile" and load it. fclose(scriptFile); //m.inputstream = inputStream; } // now we can use the machine to parse etc if (asmFileName != NULL) { if ((asmFile = fopen(asmFileName, "r")) == NULL) { printf("Could not open script-assembler file %s%s%s \n", YELLOW, asmFileName, NORMAL); exit(1); } if (progmode != FILTERMODE) { printf("\n"); printf("Loading assembler file %s%s%s \n", YELLOW, asmFileName, NORMAL); } strcpy(m.program.source, "asm.pp"); loadAssembledProgram(&m.program, asmFile); if (progmode != FILTERMODE) { printf("Compiled %s%d%s instructions " "from '%s%s%s' in about %s%ld%s milliseconds \n", CYAN, m.program.count, NORMAL, CYAN, m.program.source, NORMAL, CYAN, m.program.compileTime, NORMAL); } fclose(asmFile); } if ((progmode == FILTERMODE) && (asmFileName != NULL)) { //printf("Running script:\n"); //or runDebug(&m); run(&m); exit(0); } char line[401]; char command[201]; char argA[201]; char argB[201]; char args[300]; while (1) { printf(">"); fgets(line, 400, stdin); // remove newline line[strlen(line) - 1] = '\0'; //printf("input[%s]\n", line); command[0] = args[0] = argA[0] = argB[0] = '\0'; // int result; sscanf(line, "%200s %200[^\n]", command, args); // we also need to deal with quoted arguments such as // a "many words" // eg ranges [a-z] character classes :alpha: etc. /* A good example of sscanf with different cases res = sscanf(buf, " \"%5[^\"]\" \"%19[^\"]\" \"%29[^\"]\" %lf %d", p->number, p->name, p->description, &p->price, &p->qty); if (res < 5) { static const char *where[] = { "number", "name", "description", "price", "quantity"}; if (res < 0) res = 0; fprintf(stderr, "Error while reading %s in line %d.\n", where[res], nline); break; } */ // eg sscanf(line, "%200s '%200[^']'", command, argA) // printf("c=%s, argA=%s, argB=%s \n", command, argA, argB); testCom = textToTestCommand(command); //printf("testcom= %s \n", testInfo[testCom].description); if (testCom == HELP) { showTestHelp(); } else if (testCom == SEARCHCOMMAND) { if (strlen(args) == 0) { printf("No command seach term specified. \n"); printf("Try: // \n"); continue; } searchCommandHelp(args); } else if (testCom == SEARCHHELP) { if (strlen(args) == 0) { printf("No seach term specified. \n"); printf("Try: / \n"); continue; } searchHelp(args); } else if (testCom == COMMANDHELP) { if (strlen(args) == 0) { printf("No machine command specified. \n"); continue; } if (textToCommand(args) == UNDEFINED) { printf("Not a valid machine command %s%s%s \n", YELLOW, args, NORMAL); continue; } enum Command thisCom = textToCommand(args); printf("name: %s %s(%s%c%s)%s \n", info[thisCom].name, GREY, GREEN, info[thisCom].abbreviation, GREY, NORMAL); printf(" %s \n", info[thisCom].shortDesc); printf(" %s \n", info[thisCom].longDesc); printf("example: %s \n", info[thisCom].example); printf("takes %s%d%s argument(s) \n", YELLOW, info[thisCom].args, NORMAL); } else if (testCom == LISTMACHINECOMMANDS) { showCommandNames(); } else if (testCom == DESCRIBEMACHINECOMMANDS) { commandHelp(); } else if (testCom == LISTCLASSES) { printClasses(); } else if (testCom == LISTCOLOURS) { showColours(); } else if (testCom == MACHINEPROGRAM) { showMachineTapeProgram(&m); } else if (testCom == MACHINESTATE) { showMachine(&m, TRUE); } else if (testCom == MACHINETAPESTATE) { showMachineWithTape(&m); } else if (testCom == MACHINEMETA) { printMachineMeta(&m); } else if (testCom == BUFFERSTATE) { showBufferInfo(&m.buffer); } else if (testCom == STACKSTATE) { showStackWorkPeep(&m, TRUE); } else if (testCom == TAPESTATE) { printSomeTape(&m.tape, TRUE); } else if (testCom == TAPEINFO) { printTapeInfo(&m.tape); } else if (testCom == RESETINPUT) { rewind(inputStream); printf ("%s Rewound the input stream %s \n", YELLOW, NORMAL); } else if (testCom == RESETMACHINE) { colourPrint("reinitializing machine ...\n"); resetMachine(&m); } else if (testCom == STEPMODE) { mode = STEP; printf( "%sSTEP%s mode ( steps next instruction)\n", YELLOW, NORMAL); } else if (testCom == PROGRAMMODE) { mode = PROGRAM; printf( "%sPROGRAM%s mode ( displays program listing)\n", YELLOW, NORMAL); } else if (testCom == MACHINEMODE) { mode = MACHINE; printf( "%sMACHINE%s mode ( displays machine state)\n", YELLOW, NORMAL); } else if (testCom == COMPILEMODE) { printf ("Not implemented ... \n"); } else if (testCom == IPCOMPILEMODE) { mode = IPCOMPILE; printf ("Mode set to %sIPCOMPILE%s \n", YELLOW, NORMAL); printf ( " Instructions will be compiled into the program \n" " at the current instruction pointer position, instead \n" " of at the end of the program. \n"); } else if (testCom == ENDCOMPILEMODE) { mode = ENDCOMPILE; printf ("Mode set to %sENDCOMPILE%s \n", YELLOW, NORMAL); } else if (testCom == INTERPRETMODE) { mode = INTERPRET; printf("Mode set to %sINTERPRET%s \n", YELLOW, NORMAL); printf(" %smachine commands will be executed but not compiled \n" " to the internal program.%s \n", WHITE, NORMAL); } // ENTER key pressed .... else if (strcmp(command, "") == 0) { if (mode == MACHINE) { // to do: show the stream with ftell and fseek showMachine(&m, TRUE); } else if (mode == STEP) { step(&m); printSomeProgram(&m.program, 5); showMachine(&m, TRUE); } else if (mode == PROGRAM) { printSomeProgram(&m.program, 5); } } else if (strcmp(command, "BB") == 0) { showBufferAndPeep(&m); } else if (testCom == LISTPROGRAM) { printProgram(&m.program); } else if (testCom == LISTSOMEPROGRAM) { printSomeProgram(&m.program, 7); } else if (testCom == LISTPROGRAMWITHLABELS) { printProgramWithLabels(&m.program); } else if (testCom == PROGRAMMETA) { printProgramMeta(&m.program); } else if (testCom == SAVEPROGRAM) { FILE * savefile; if (strlen(args) == 0) { printf ("%s No save file given! %s \n", CYAN, NORMAL); printf ("%s Try %sP.wa %s \n", GREEN, YELLOW, NORMAL); continue; } if ((savefile = fopen (args, "w")) == NULL) { printf ("Could not open file %s%s%s for writing \n", YELLOW, args, NORMAL); continue; } saveAssembledProgram(&m.program, savefile); fclose(savefile); } else if (testCom == SHOWJUMPTABLE) { printJumpTable(m.program.labelTable); } else if (testCom == LOADPROGRAM) { FILE * loadfile; if (strlen(args) == 0) { printf ("%s No assembly text file given to load %s \n", GREEN, NORMAL); continue; } if ((loadfile = fopen (args, "r")) == NULL) { printf ("Could not open file %s%s%s for reading\n", YELLOW, args, NORMAL); continue; } loadAssembledProgram(&m.program, loadfile); strcpy(m.program.source, args); fclose(loadfile); } else if (testCom == LOADASM) { FILE * loadfile; if ((loadfile = fopen ("asm.pp", "r")) == NULL) { printf ("Could not open file %sasm.pp%s for reading\n", YELLOW, NORMAL); continue; } printf("%sresetting machine and loading '%sasm.pp%s'... \n", BROWN, PINK, NORMAL); rewind(inputStream); resetMachine(&m); clearProgram(&m.program); strcpy(m.program.source, "asm.pp"); loadAssembledProgram(&m.program, loadfile); fclose(loadfile); } else if (testCom == LOADLAST) { FILE * loadfile; if ((loadfile = fopen ("last.pp", "r")) == NULL) { printf ("Could not open file %slast.pp%s for reading\n", YELLOW, NORMAL); continue; } strcpy(m.program.source, "last.pp"); loadAssembledProgram(&m.program, loadfile); fclose(loadfile); } else if (testCom == LOADSAVED) { FILE * loadfile; if ((loadfile = fopen ("sav.pp", "r")) == NULL) { printf ("Could not open file %ssav.pp%s for reading\n", YELLOW, NORMAL); continue; } strcpy(m.program.source, "sav.pp"); loadAssembledProgram(&m.program, loadfile); fclose(loadfile); } else if (testCom == SAVELAST) { FILE * savefile; if ((savefile = fopen ("last.pp", "w")) == NULL) { printf ("Could not open file %slast.pp%s for writing \n", YELLOW, NORMAL); continue; } saveAssembledProgram(&m.program, savefile); fclose(savefile); } else if (testCom == CHECKPROGRAM) { validateProgram(&m.program); } else if (testCom == CLEARPROGRAM) { clearProgram(&m.program); } else if (testCom == CLEARLAST) { // need to set last instruction to undefined newInstruction(&m.program.listing[m.program.count-1], UNDEFINED); m.program.count--; printSomeProgram(&m.program, 7); } else if (testCom == INSERTINSTRUCTION) { // insert an instruction at the current program ip // need to recalculate jump address (add 1) printf ("Inserting instruction at %s%d%s \n", YELLOW, m.program.ip, NORMAL); insertInstruction(&m.program); printProgram(&m.program); } else if (testCom == EXECUTEINSTRUCTION) { execute(&m, &m.program.listing[m.program.ip]); showMachineTapeProgram(&m); /* printSomeProgram(&m.program, 6); printf("%s--------- Machine State ----------%s \n", YELLOW, NORMAL); showMachine(&m); */ } else if (testCom == PARSEINSTRUCTION) { struct Instruction * ii; struct Label table[100]; // void jumptable printf("testing instructionFromText()...\n"); if (strlen(args) == 0) { printf ("%s No example instruction given... %s \n" " Try %spi: add {this}%s \n", GREEN, NORMAL, YELLOW, NORMAL); continue; } newInstruction(ii, UNDEFINED); instructionFromText(stdout, ii, args, -1, table); printEscapedInstruction(ii); printf("\n"); } else if (testCom == TESTWRITEINSTRUCTION) { printf("Testing writeInstruction()... \n"); writeInstruction(&m.program.listing[m.program.ip], stdout); } else if (testCom == STEPCODE) { printf("step through next compiled instruction:\n"); step(&m); printSomeProgram(&m.program, 6); } else if (testCom == RUNCODE) { printf("%sRunning program from instruction %s%d%s...%s\n", GREEN, CYAN, m.program.ip, GREEN, NORMAL); run(&m); printf("\n%s--------- Machine State ----------%s \n", YELLOW, NORMAL); showMachine(&m, TRUE); } else if (testCom == RUNZERO) { printf("%sRunning program from ip 0%s\n", GREEN, NORMAL); m.program.ip = 0; run(&m); } else if (testCom == RUNTOTRUE) { printf("run program until the flag is set to true\n"); runUntilTrue(&m); } else if (testCom == RUNTOWORK) { if (strlen(args) == 0) { printf ("%sNo text given to compare%s \n", GREEN, NORMAL); printf ("%sUsage: rrw %s%s \n", GREEN, YELLOW, NORMAL); printf (" runs the current program until workspace is \n"); continue; } printf("Running program until workspace is \"%s%s%s\" \n", YELLOW, args, NORMAL); runUntilWorkspaceIs(&m, args); // printf("%s--------- Machine State ----------%s \n", // YELLOW, NORMAL); showMachineWithTape(&m); } else if (testCom == IPZERO) { m.program.ip = 0; printSomeProgram(&m.program, 4); } else if (testCom == IPEND) { m.program.ip = m.program.count-1; printSomeProgram(&m.program, 4); } else if (testCom == IPGO) { if (strlen(args) == 0) { printf ("%s No number given to jump to %s \n", GREEN, NORMAL); continue; } int ipnumber = 0; int res = sscanf(args, " %d", &ipnumber); if (res < 1) { printf ("%s Couldnt parse number %s \n", GREEN, NORMAL); continue; } m.program.ip = ipnumber; printSomeProgram(&m.program, 4); } else if (testCom == IPPLUS) { m.program.ip++; printSomeProgram(&m.program, 4); } else if (testCom == IPMINUS) { m.program.ip--; printSomeProgram(&m.program, 4); } else if (testCom == SHOWSTREAM) { // show some upcoming chars from stream and then // reset stream position long pos = ftell(m.inputstream); int num = 40; char c; int ii = 0; printf ("%sNext %s%d%s chars in input stream...%s\n", GREEN, YELLOW, num, GREEN, BLUE); while (((c = fgetc(m.inputstream)) != EOF) && (ii < num)) { printf("%c", c); ii++; } printf("%s\n", NORMAL); fseek(m.inputstream, pos, SEEK_SET); } // deal with pure machine commands (not testing commands) else if (textToCommand(command) != UNDEFINED) { void * p = NULL; // no jump table here if (mode == IPCOMPILE) { compile(&m.program, line, m.program.ip, p); } else if (mode == INTERPRET) { // execute the entered command but dont insert it // in the program. struct Instruction ii; instructionFromText(stdout, &ii, line, 0, p); execute(&m, &ii); showMachineTapeProgram(&m); continue; } else { compile(&m.program, line, m.program.count, p); } // check if the instruction is ok // checkInstruction(&m.program.listing[m.program.count-1], stdout); if (mode == COMPILE) { printSomeProgram(&m.program, 4); } else step(&m); m.program.ip = m.program.count; printSomeProgram(&m.program, 5); } else if (testCom == EXIT) { printf("%sSaving program to '%slast.pp%s'...\n", GREEN, YELLOW, GREEN); FILE * savefile; if ((savefile = fopen ("last.pp", "w")) == NULL) { printf ("Could not open file %slast.pp%s for writing \n", YELLOW, NORMAL); continue; } saveAssembledProgram(&m.program, savefile); fclose(savefile); colourPrint("Freeing Machine memory...\n"); freeMachine(&m); colourPrint("Goodbye !!\n"); exit(1); } else { printf("%sUnrecognised command:%s %s %s \n", BROWN, command, GREEN, NORMAL); } } //newTape(&t); printTape(&t); fclose(inputStream); return(0); } // /* OTHER PARSING IDEAS The code below implements a shift-reduce parser and a token lexer in a reasonably simple format. Another idea for parsing: a token/attribute structure and an open stack to hold it. Then compare token sequences lexing phase using fns such as scanargument, scancommand ... for some tokens, standard c scanf() fn is ok parsing phase uses a reduction loop to implement bnf shift-reduce grammar rules... // use a pointer to last item on stack eg // item * last = tokens[end]; struct item { char * attribute; enum Token token; } enum Token = {ARGUMENT, LBRACE, RBRACE, COMMAND ... }; // tokenInfo array with information about each token type // scanArgument etc works just like scanParameter in the machine code // read loop: while (more chars in input stream) { // ---------------- // lexing phase. Lexical analysis. // could use a switch on nextChar if (nextchar == '{') { // no attributes (values for { itemarray[end]->token = LBRACE; // end++ last++ // increment pointer } if (nextchar == '}') { // no attributes (values for { // last->token = RBRACE; itemarray[end]->token = RBRACE; last++ } if (nextchar == '[') { itemarray[end]->token = ARGUMENT; // nextChar = scanArgument(itemarray[end]) itemarray[end]->ttribute = scanArgument(..); end++ } if (nextchar == [a-z]) { itemarray[end]->token = COMMAND; itemarray[end]->ttribute = scanArgument(..); end++ } if (nextchar == [:space:]) { scanSpace(..); } // --------- // parsing phase // the token reduction loop (corresponds to bnf grammar // reduction. // The reduction loop needs to keep going until no more // reductions are found. int reduction = 0; while (reduction) { reduction = 1; // or fn: stackend(COMMAND, ARGUMENT) // or if (stackend(&tokstack, COMMAND, ARGUMENT)) // or if (last[-1]->token == COMMAND && last->token == ARGUMENT) { if (tokenarray[end-1].token == COMMAND && tokenarray[end].token == ARGUMENT) { // get items off stack, manipulate attributes then // put new tokenitem on stack with new attribute reduction = 0; // check all other reductions tokenarray[end-1].token == INSTRUCTION; //... // some string manipulation on attributes // do malloc on attrib if not enough string space strcat(tokstack[end-1].attrib, tokstack[end].attrib); end--; } // ... more reductions } // while more reductions } // read input stream loop This technique could be used to parse the script file ---------------- */