.ifdef 0

  This file is an attempt to port a byte-code forth system from 
  the x86 architecture to the atmega328 chip (usually running on some
  kind of arduino system). The x86 system is at 
     bumble.sf.net/books/osdev/os.asm

  The system uses ijmp to call the opcodes and caches the 
  top-of-stack in r25:r24

 CHALLENGES

   Dealing with the Harvard architecture of the avr chip is going
   to be tricky. However Flashforth contains some solutions to this.
   However, in forth, data words (defined with "var" or "constant" or 
   something else) are kept in the link-list dictionary right next to
   functions or procedures. On an x86 system we allocate the actually
   data space right next to, or after the definition of the data word
   in the dictionary, but in a Harvard archicture system we cannot do
   this. 
   
   The reason, is that a program may write to the data space
   thousands or millions of times during a single program execution and
   this would soon exhaust the flash-memory technology which is used
   to implement to the code space on an avr chip (and on many other 
   microcontrollers). So the data-space for actually storing (for example)
   a "string" will not be contiguous with the definition of the data word.
   The data space will be stored in "data memory" (such as eeprom or other)
   and this data memory is usually extremely limited on microcontrollers
   (2K in size on a atmega328p "arduino"). 

   So "var" will push onto the data-stack a pointer to this data in
   data-memory. The system will then "translate" that address into 

   The an "arduino" atmega328p contains a bootloader which is not necessary
   for a forth system. So I need to learn how to burn the forth system into
   the arduino board (and override the bootloader). 
   
   Also, I need to learn how to use the SPM instruction
   to actually write to flash (program) memory. This may involve using
   a ram buffer, because some flash memory can only be written one page
   at a time. 

   Flash memory should not be written to overly frequently (unlike data
   memory), so it may be better not to recompile code that is already on the
   atmega/avr chip. So "create" could check for a word of the same name and
   abort if it already exists in the dictionary.

  COMPILE UPLOAD AND RUN

    Use some aliases, bash functions to aid development.

   In the "vimrc" file the commands will not have any leading ">>"

   * compile an assembly arduino/atmega/avr program 
   >> map ,A :! avra %;
   
   * compile whole file and upload to the arduino 'uno' board
   >> map ,V :! avra %; sudo avrdude -p atmega328p -carduino -P/dev/ttyACM0 -b115200 -D -Uflash:w:%:r.hex:i

   * compile and upload to the arduino 'uno' board
   >> avra os.avr.asm; sudo avrdude -p atmega328p -carduino -P/dev/ttyACM0 -b115200 -D -Uflash:w:os.avr.hex:i

   * compile and upload to the arduino 'duemilenove' board
   >> avra os.avr.asm; sudo avrdude -p atmega328p -carduino -P/dev/ttyUSB0 -b57600 -D -Uflash:w:os.avr.hex:i

   * start a serial terminal to communicate with the arduino
   >> sudo picocom -b 9600 --echo /dev/ttyACM0

   * a bash function to compile and upload to uno board
   -----
   aos() {
     # cat os.avr.asm | sed -f os.sed > os.avr.pre.asm 
     avra os.avr.asm
     sudo avrdude -p atmega328p -carduino -P/dev/ttyACM0 \
        -b115200 -D -Uflash:w:test.hex:i
   }
   ,,,

  GOTCHAS

     Uneven numbers of bytes in .db sections get padded with
     an extra zero, which may cause absolute havoc in many
     situations...!!!

     "LIT, 132" doesnt do what is expected, because the 132 will
     be sign extended into 2 bytes (and will become a negative number).
     Asci characters seem to work because the "emit" opcode ignores
     the high byte of the stack.

     If there is a serial connection to the micro then avrdude will report
      avrdude: stk500_getsync() attempt 1 of 10: not in sync: resp=0x00

     When using fcall we need to use the label<<1 idiom because rom code
     memory is word addressable. eg
       .db STAR, FCALL, low(proc<<1), high(proc<<1), KEY, EMIT

  HISTORY
  

  * 29 Dec 2018
     The bug in fcall maybe that the "Y" double register is NOT the
     return stack but "fcall" thinks that it is!!

     Compiling with no errors today with "avra".
     I will remove domain prefixes because I will use an interleaved
     link-list dictionary to implement namespaces.

     When I left this in July, I had not solved the Return-stack and 
     Pstack problem. Perhaps I can solve this by writing a ".s" stack
     display word in avr assembler so as to glimpse what is going on
     in the dark recesses of the displayless mini-computer. Once that
     is done, then incorporate the ideas of the interleaved link-list
     dictionary to allow namespaces and universal unique forth names.

     Also, need to think about how to incorporate the avr (arduinos)
     timers into the machine. Then deal with the Harvard archicture
     tangle for separating code and data. (Both code and data spaces can
     be written but with different avr opcodes). One idea is the 
     "memory map" from flash forth. 
     
     In August 2018 I did work on the arduino book bumble.sf.net/books/arduino
     which contains useful but incomplete timer and pwm examples.
     I realised that the Mazidi avr book is less useful than I hoped because
     the instructions need to be rewritten for atmega328p and the Mazidi
     coding style is sometimes not good (He uses hardcoded register numbers
     rather than defined mnemonics which would make the code more portable
     between different avr chips as well as more readable).

     Time to print out this code and look at it closely to sort out the 
     proceedure calling problems.

  * 28 july 2018
     Wrote fcall, exit, pcall. But I think that Rstack and Pstack
     are interfering with each other (since they occupy the same buffer).
     This means that the initialisation of the Rstack is not working
     as it should.

     gotcha! avra adds a zero byte to odd numbered bytes in
     .db sections (because atmega flash is word addressable).
     This makes the "code:" section abort if uneven number of bytes!

     would be good to write a .s(hex) in machine code to make 
     debugging easier. Also, started to write a series of tests to
     make sure each opcode is working properly. 

  * 27 july 2018

     System is currently 1684 bytes. trying to debug jumpnz logic.
     sbiw and adiw are equivalent to sbi + sbci but occupy 
     less space (16bits vs 32 bits). Maybe can just use the Y index
     for the return stack pointer. The return stack could grow up
     in the same buffer as the datastack, but not sure of the advantage.

  * 26 july 2018

     Code is currently 1212 bytes (only opcodes).
     Need to look at the 'neg' code, dont think it is correct.
     Ported more opcodes up to "litw". Found the following way to 
     sign extend 8bits to 16.
     ;signextend:
     ldi   r25, 127      ; can be hoisted out of loops, and any reg is fine.
     cp    r25, r24      ; C = (r24 < 0)
     sbc   r25, r25      ; r25 = (r24 < 0) ? -1 : 0
     ; result in r25:r24

  * 25 july 2018
     Ported more opcodes from x86. Seems fairly straight forward.
     Maybe tosh and tosl should be kept in Z which would make 
     "@", "fetch" and "!" easier and faster. No! because the exec routine
     needs zh:zl to get opcodes.
     Also, the idea of an opcode which pushes the memory map
     onto the stack (eg eeprom, rom, ram). In a von-neumann 
     machine the rom and ram will be the same but in a harvard
     architecture they will not.
     Had the idea of programming the bytecode onto the arduino
     using "key" and ! or c,
  
  * 24 july 2018
     continuing to port opcodes from the x86 version. Grappling
     with a .db problem in the code: section. When I write two
     .db sections on separate lines, the assembler (avra) seems
     to put a zero at the end of the first .db line. Added some
     vim macros to compile and upload whole file to arduino uno.

  * 22 july 2018
     Need to move the exec routine into a word. Then deal with
     jumps. Then recode all of the x86 system in avr assembler.
     Got the offset working but not propagating the carry to pch
     (program counter high byte).
     Struggling with jump to get the offset correct in the byte code
     section. Separated this into its own file to 
     continue development there. This is a direct parallel to
     bumble.sf.net/books/osdev/os.asm  (which is x86 code)

  * 20 july 2018
     The opcode retrieval and indirect procedure call appear to
     be working. These are the essential elements for a bytecode 
     system. using LPM and ICALL. Will convert to IJMP so that the 
     data stack is easier to use; that is, by using ijmp instead 
     of ICALL no return code address is placed on the avr data-stack
     by the calling procedure. So we do not need to "juggle" this
     return address when trying to get access to the data items
     underneath it on the stack (which we are constantly doing because
     forth is a stack-machine).

  * 19 July 2018
     Started writing a bytecode system. 
     Revisiting this to try to create a forth style bytecode
     system using avr assembly. I will use ideas from the x86 
     forth bytecode system at bumble.sf.net/books/osdev/os.asm

  * 8 November 2016
     I combined the arduino book with the avr book, since my 
     main focus now with the arduino will be programming in
     assembler. First I need to learn the avr architecture.

 .endif

  .include "m328Pdef.inc"

    ; or use Y for the return stack
    .def rph = r1   ; return stack pointer, high byte
    .def rpl = r0   ; return stack pointer, low byte
    .def pch = r23  ; (bytecode program counter high-byte)
    .def pcl = r22
    .def tosh = r25 ; (top-of-data-stack pointer high byte)
    .def tosl = r24 
    ; also here define the datastack start location and the 
    ; return stack location 
    ; eg, This allows recursive procedures.
     .equ DSTACKH = high(ramend)
     .equ DSTACKL = low(ramend)
     .equ RSTACKH = high(ramend)-512
     .equ RSTACKL = low(ramend)-512

    ; Opcode constants for the bytecode machine. These are (naturally)
    ; the same as for the x86 machine (hopefully to allow "write once,
    ; run anywhere"). Some mnemonics are just aliases for other bytecodes
    ; such as "count" and "cfetchplus" which is "c@+". Count is also
    ; a standardish forth word and character-fetch-plus is analogous to
    ; real post increment machine fetch opcodes.
    ; Some opcodes may not be necessary or useful on a simple microcontroller
    ; but they should be "installable" depending on what hardware is 
    ; connected and available to the micro. This is an extended idea of
    ; what a virtual machine is or should be (it seems that normally VMs
    ; do not include periferal hardward, sensor, transducers, etc in the 
    ; semantics of the VM).
    .equ DUP = 1 
    .equ DROP = DUP+1
    .equ SWAP = DROP+1
    .equ OVER = SWAP+1
    .equ TWODUP = OVER+1
    .equ FLAGS = TWODUP+1 
    .equ DEPTH = FLAGS+1
    .equ RDEPTH = DEPTH+1 
    .equ RON = RDEPTH+1 
    .equ ROFF = RON+1 
    .equ STORE = ROFF+1
    .equ STOREPLUS = STORE+1
    .equ FETCH = STOREPLUS+1
    .equ FETCHPLUS = FETCH+1
    .equ CSTORE = FETCHPLUS+1
    .equ CSTOREPLUS = CSTORE+1
    .equ CFETCH = CSTOREPLUS+1
    .equ CFETCHPLUS = CFETCH+1
    .equ COUNT = CFETCHPLUS    ; count is an alias for c@+
    .equ EQUALS = CFETCHPLUS+1
    .equ NOTEQUALS = EQUALS+1
    .equ LESSTHAN = NOTEQUALS+1
    .equ ULESSTHAN = LESSTHAN+1
    .equ LIT = ULESSTHAN+1
    .equ LITW = LIT+1
    .equ EMIT = LITW+1 
    .equ KEY = EMIT+1 
    .equ EKEY = KEY+1
    .equ GETXY = EKEY+1
    .equ ATXY = GETXY+1
    .equ PLUS = ATXY+1
    .equ MINUS = PLUS+1
    .equ INCR = MINUS+1
    .equ DECR = INCR+1 
    .equ NEGATE = DECR+1
    .equ DPLUS = NEGATE+1
    .equ LOGOR = DPLUS+1       ; logical or
    .equ LOGXOR = LOGOR+1
    .equ LOGAND = LOGXOR+1
    .equ DIVMOD = LOGAND+1
    .equ TIMESTWO = DIVMOD+1 
    .equ MULT = TIMESTWO+1 
    .equ UMULT = MULT+1 
    .equ FCALL = UMULT+1 
    .equ PCALL = FCALL+1
    .equ EXIT = PCALL+1 
    .equ LJUMP = EXIT+1 
    .equ JUMP = LJUMP+1 
    .equ JUMPZ = JUMP+1 
    .equ JUMPF = JUMPZ     ; "jumpf" jump-false is an alias for jumpz (zero)
    .equ JUMPNZ = JUMPZ+1 
    .equ JUMPT = JUMPNZ    ; "jump-true" is an alias for jump-not-zero
    .equ RLOOP = JUMPNZ+1
    .equ DIVTWO = RLOOP+1
    .equ READ = DIVTWO+1  ; loads sectors from disk
    .equ WRITE = READ+1   ; writes sectors to disk (usb etc)
    .equ FG = WRITE+1   ; foreground colour
    .equ BG = FG+1      ; background colour
    .equ CLS = BG+1     ; clear screen 
    .equ VID = CLS+1    ; video mode
    .equ PIX = VID+1    ; one pixel on screen at xy 
    .equ GLYPH = PIX+1  ; display glyph on screen 
    .equ RTC = GLYPH+1  ; real time clock 
    .equ CLOCK = RTC+1  ; number of clock ticks since midnight 
    .equ NOOP = CLOCK+1 ; no operation & end marker  
    .equ STAR = NOOP+1 
    .equ DIGIT = STAR+1 

    ; A list of symbolic constants for domain name prefixes
    .equ LANG = 1       ; language name 
    .equ CORE = 2       ; core ops or primitives
    .equ MON = 3        ; monitor 
    .equ TIME = 4       ; time and date 
    .equ DISK = 5       ; storage 
    .equ INOUT = 6      ; input output 

   .dseg
   ; ram data segment 
   .cseg
   ; start of flash "rom" code segment

   .org 0x0000

   jmp wakeup

   ; a table of code pointers. The pointers have the 
   ; same offset (*2) in the table as the value of the opcode
   optable:
     .dw 0, dupx, dropx, swapx, overx, twodupx
     .dw flagsx, depthx, rdepthx, ronx, roffx
     .dw storex, storeplusx, fetchx, fetchplusx 
     .dw cstorex, cstoreplusx, cfetchx, cfetchplusx 
     .dw equalsx, notequalsx
     .dw lessthanx, ulessthanx, litx, litwx
     .dw emitx, keyx, ekeyx, getxyx, atxyx
     .dw plusx, minusx, incrx, decrx, negx
     .dw dplusx, logorx, logxorx, logandx
     .dw divmodx, timestwox, multx, umultx
     .dw fcallx, pcallx, exitx
     .dw ljumpx, jumpx, jumpzx, jumpnzx, rloopx
     .dw divtwox, readx, writex
     .dw fgx, bgx, clsx, vidx, pixx
     .dw glyphx
     .dw rtcx, clockx
     ; could put just-in-time opcodes here.
     .dw noopx,  
     .dw starx, digitx, -1

   ; 
   execx:
     jmp nextopcode 

   dupdoc:
     ; db 'Duplicates the top item on the stack.'
     ; dw $-dup.doc
   duph:
     .db LANG         ; domain name prefix for universal naming 
     .dw 0            ; link to previous word 
     .db "dup", 3     ; strings are 'counted' 
   dupx:
     push tosh      ; high byte
     push tosl      ; low byte
     jmp nextopcode 

   dropdoc:
     ; db 'removes the top item on the stack.'
     ; dw $-drop.doc
   droph:
     .db 0           ; domain name prefix for universal naming 
     .dw dupx        ; link to previous word 
     .db "drop", 4   ; name with reverse count 
   dropx:
     pop tosl
     pop tosh
     jmp nextopcode

   swapdoc:
     ; db 'swaps the top 2 items on the stack.'
     ; dw $-swap.doc
   swaph:
     .dw dropx         ; link to previous word 
     .db "swap", 4
   swapx:
     movw r17:r16, tosh:tosl 
     pop tosl
     pop tosh
     push r17
     push r16
     jmp nextopcode

   overdoc:
     ; db ' ( n1 n2 -- n1 n2 n1 ) '
     ; db ' Puts a copy of 2nd stack item on top of stack. '
     ; db ' dont use this, will probably remove. '
     ; dw $-over.doc
   overh:
     .db LANG         ; domain name prefix for universal naming 
     .dw swapx        ; back-link 
     .db "over", 4
   overx:
     pop r16 
     pop r17 
     push r17 
     push r16 
     push tosh 
     push tosl 
     movw tosh:tosl, r17:r16
     jmp nextopcode

   twodupdoc:
     ; db ' ( n1 n2 -- n1 n2 n1 n2 ) '
     ; db ' copies 2 stack items onto stack '
     ; dw $-twodup.doc
   twoduph: 
     .db LANG        ; domain name prefix for universal naming 
     .dw overx       
     .db "2dup", 4
   twodupx:
     pop r16 
     pop r17 
     push r17 
     push r16 
     push tosh 
     push tosl 
     push r17 
     push r16 
     jmp nextopcode
 
   flagsdoc:
     ; db ' ( -- n ) '
     ; db ' push flag register onto the stack  '
     ; db ' execution flags such as carry overflow negate etc '
     ; db ' are pushed onto the stack '
     ; dw $-flags.doc
   flagsh: 
     .db LANG        ; domain name prefix for universal naming 
     .dw twodupx  
     .db "flags", 5
   flagsx:
     ;push dx   ; save NOS on stack
     ;pushf
     ;pop dx    ; get flags into TOS (dx)
     jmp nextopcode
 
   depthdoc:
     ; db ' ( -- n ) '
     ; db ' Puts on the stack the number of stack items '
     ; db ' before this word was executed '
     ; dw $-depth.doc
   depthh: 
     .db LANG        ; domain name prefix for universal naming 
     .dw flagsx   
     .db "depth", 5
   depthx:
     ; untested code
     ldi r17, high(ramend)  
     in r19, sph
     ldi r16, low(ramend)    
     in r18, spl
     sub r16, r18   ; r17:r16 - r19:r18
     sbc r17, r19 
     ; need to divide by 2 
     lsr r17        ; div r17:r16 by 2 
     ror r16
     push tosh      ; high byte
     push tosl      ; low byte (character)
     movw tosh:tosl, r17:r16
     jmp nextopcode
 
   rdepthdoc:
     ; db ' ( -- n ) '
     ; db ' Puts on the stack the number of stack items '
     ; db ' on the return stack before this word '
     ; db ' was executed '
     ; dw $-rdepth.doc
   rdepthh: 
     .db LANG        ; domain name prefix for universal naming 
     .dw depthx  
     .db "rdepth", 6
   rdepthx:
     ;push dx
     ;mov dx, di 
     ;shl dx, 1

     ; where to implement the return stack?
     ; above the data stack?
     jmp nextopcode
 
   rondoc:
     ; db '( S: n -- )( R: -- n ) '
     ; db ' put the top item of the data stack onto the return stack.'
     ; dw $-ron.doc
   ronh: 
     .db LANG        ; domain name prefix 
     .dw rdepthx      
     .db ">r", 2
   ronx:
     st Y+, tosl 
     st Y+, tosh     ; store on return stack, increment 
     pop tosl           
     pop tosh          
     jmp nextopcode
 
   roffdoc:
     ; db '( S: -- n)( R: n -- ) '
     ; db ' put the top item of the return stack onto the data stack.'
     ; dw $-roff.doc
   roffh: 
     .db LANG        ; domain name prefix 
     .dw ronx      
     .db "r>", 2
   roffx:
     push tosh       ; tos becomes nos
     push tosl        
     ld tosh, -Y     ; get top of return stack and decrement
     ld tosl, -Y     ; 
     jmp nextopcode
 
   ; this needs to be "vectored" for eeprom/rom/ram memory types
   storedoc:
     ; db '( w adr -- ) '
     ; db ' place 2 byte value w at address "adr" '
     ; dw $-store.doc
   storeh: 
     .db LANG        ; domain name prefix 
     .dw roffx       
     .db "!", 1
   storex:
     ; here we need to look at the address requested and parlay
     ; or bifurcate that into ST "store data memory" or SPM 
     ; "store program memory" instructions depending where the address
     ; is in the pre-defined "memory map". This may be tricky 
     ; because program-memory (flash memory on the atmega) may need 
     ; to be buffered before writing. Also, the standard arduino bootloader
     ; prevents writing to program memory so it must be zapped before
     ; this will word.
     movw xh:xl, tosh:tosl    ; 
     movw xh:xl, tosh:tosl    ; pointer to address
     pop r16
     pop r17         ; value to store at address
     st X+, r16 
     st X+, r17      ; 2 bytes are stored
     pop tosl           
     pop tosh          
     jmp nextopcode
 
   storeplusdoc:
     ; db '( w adr -- adr+2 ) '
     ; dw $-storeplus.doc
   storeplush: 
     .db LANG        ; domain name prefix 
     .dw storex      ; link to previous word 
     .db "!+", 2
   storeplusx:
     movw xh:xl, tosh:tosl    ; pointer to address
     pop r16
     pop r17         ; value to store at address
     st X+, r16 
     st X+, r17      ; 2 bytes are stored
     movw tosh:tosl, xh:xl 
     jmp nextopcode
 
   ; needs to be vectored for different memory types
   fetchdoc:
     ; .db '( adr -- n ) '
     ; .db ' Replace the top element of the stack with the '
     ; .db ' value of the 16bites at the given memory address '
     ; .dw $-fetch.doc
   fetchh: 
     .db LANG           ; domain name prefix 
     .dw storeplusx     ; link to previous word 
     .db "@", 1
   fetchx:
     ; here we need to look at the address requested and parlay
     ; or bifurcate that into LD "load data memory" or LPM 
     ; "load program memory" instructions depending where the address
     ; is in the pre-defined "memory map". This should not be too hard
     ; but "store" (!) will be trickier because program-memory (flash
     ; memory on the atmega) may need to be buffered before writing. 
     movw xh:xl, tosh:tosl    ; 
     ld tosl, X+
     ld tosh, X+
     jmp nextopcode
 
   ; needs to be vectored for different memory types
   fetchplusdoc:
     ; db '( adr -- adr+2 n ) '
     ; db ' Replace the top element of the stack with the '
     ; db ' value of the 16bites at the given memory address '
     ; db ' and increment the address by 2 bytes. '
     ; dw $-fetchplus.doc
   fetchplush: 
     .db LANG        ; domain name prefix 
     .dw fetchx      ; link to previous word 
     .db "@+", 2
   fetchplusx:
     movw xh:xl, tosh:tosl    ; 
     ld tosl, X+
     ld tosh, X+
     push xh        
     push xl        
     jmp nextopcode
 
   ; needs to be vectored for different memory types
   cstoredoc:
     ; db '( n adr -- ) store the byte value n at address adr.'
     ; dw $-cstore.doc
   cstoreh: 
     .db LANG            ; domain name prefix 
     .dw fetchplusx     ; link to previous word 
     .db "c!", 2
   cstorex:
     movw xh:xl, tosh:tosl    ; pointer to address
     pop r16
     pop r17         ; value to store at address r16
     st X+, r16      ; only lower byte is stored
     pop tosl           
     pop tosh          
     jmp nextopcode
 
   ; needs to be vectored for different memory types
   cstoreplusdoc:
     ; db '( n adr -- adr+1 ) store the byte value n at address adr.'
     ; db ' And increment the address '
     ; dw $-cstoreplus.doc
   cstoreplush:  
     .db LANG            ; domain name prefix 
     .dw cstorex         ; link to previous word 
     .db "c!+", 3
   cstoreplusx:
     movw xh:xl, tosh:tosl    ; pointer to address
     pop r16         ; value to store at adr
     pop r17         ; 
     st X+, r16      ; only lower byte stored
     movw tosh:tosl, xh:xl 
     jmp nextopcode
 
   ; needs to be vectored for different memory types
   cfetchdoc:
     ; db '( adr -- n ) Replace the top element of the stack with the value '
     ; db ' of the byte at the given memory address.'
     ; db ' eg: myvar @ . '
     ; db '  displays the value at the address given by "myvar" '
     ; dw $-cfetch.doc
   cfetchh: 
     .db LANG              ; domain name prefix 
     .dw cstoreplusx      ; link to previous word 
     .db "c@", 2
   cfetchx:
     movw xh:xl, tosh:tosl    ; 
     ld tosl, X
     ; sign extend tosl to tosh:tosl so -1 (8 bits) becomes -1 (16 bits)
     ldi   tosh, 127     ; can be hoisted out of loops, and any reg is fine.
     cp    tosh, tosl    ; C = (tosl < 0)
     sbc   tosh, tosh    ; tosh = (tosl < 0) ? -1 : 0
     jmp nextopcode
 
   ; needs to be vectored for different memory types
   cfetchplusdoc:
     ; db '( adr -- adr+1 n ) '
     ; db ' Replace the top element of the stack with the value '
     ; db ' of the byte at the given memory address and increment the '
     ; db ' address . This is exactly the same as "count"'
     ; dw $-fetchplus.doc
   cfetchplush: 
     .db LANG            ; domain name prefix 
     .dw cfetchx         ; link to previous word 
     .db "c@+", 3
   cfetchplusx:
     movw xh:xl, tosh:tosl    ; 
     ld tosl, X+
     ; sign extend tosl to tosh:tosl so -1 (8 bits) becomes -1 (16 bits)
     ldi   tosh, 127     ; can be hoisted out of loops, and any reg is fine.
     cp    tosh, tosl    ; C = (tosl < 0)
     sbc   tosh, tosh    ; tosh = (tosl < 0) ? -1 : 0
     push xh        
     push xl        
     jmp nextopcode
 
   equalsdoc:
     ; db ' ( n1 n2 -- flag ) '
     ; db 'Puts -1 (true) on the stack if n1==n2 '
     ; db 'otherwise puts zero (false) on the stack. '
     ; dw $-equals.doc
   equalsh: 
     .db LANG             ; domain name prefix 
     .dw cfetchplusx     ; link to previous word 
     .db "=", 1     
   equalsx:
     pop r16
     pop r17        ; NOS high byte 
     cp r16, tosl   ; 
     brne notequal
     cp r17, tosh   ; 
     brne notequal
     ldi tosh, 0
     ldi tosl, 1
     jmp nextopcode
   notequal:
     ldi tosh, 0
     ldi tosl, 0
     jmp nextopcode
 
   notequalsdoc:
     ; db ' ( n1 n2 -- flag ) '
     ; db 'Puts 0 (false) on the stack if n1==n2 '
     ; db 'otherwise puts -1 (true) on the data stack'
     ; dw $-notequals.doc
   notequalsh: 
     .db LANG         ; domain name prefix 
     .dw equalsx     ; link to previous word 
     .db "<>", 2     
   notequalsx:
     pop r16
     pop r17        ; NOS high byte 
     cp r16, tosl   ; 
     brne doesnotequal
     cp r17, tosh   ; 
     brne doesnotequal
     ldi tosh, 0
     ldi tosl, 0
     jmp nextopcode
   doesnotequal:
     ldi tosh, 0
     ldi tosl, 1
     jmp nextopcode
 
   lessthandoc:
     ; db ' ( n1 n2 -- flag ) '
     ; db 'Puts 0 (false) on the stack if not n1<n2 '
     ; db 'otherwise puts 1 (true) on the data stack'
     ; db 'this is a signed comparison'
     ; dw $-lessthan.doc
   lessthanh:  
     .db LANG         ; domain name prefix 
     .dw notequalsx    
     .db "<", 1     
   lessthanx:
     ; but is this really signed??
     pop r16
     pop r17        ; NOS high byte 
     sub tosl, r16  ; 
     sbc tosh, r17
     breq notlessthan
     brmi notlessthan
     ldi tosh, 0
     ldi tosl, 1
     jmp nextopcode
   notlessthan:
     ldi tosh, 0
     ldi tosl, 0
     jmp nextopcode
 
   ulessthandoc:
     ; db ' ( n1 n2 -- flag ) '
     ; db 'Puts true -1 or 1 on the stack if n1<n2 '
     ; db 'otherwise puts false=0 on the data stack'
     ; db 'this is an unsigned comparison'
     ; dw $-ulessthan.doc
   ulessthanh: 
     .db LANG         ; domain name prefix 
     .dw lessthanx    
     .db "u<", 2     
   ulessthanx:
     pop r16
     pop r17        ; NOS high byte 
     sub tosl, r16  ; 
     sbc tosh, r17
     breq unotlessthan
     brmi unotlessthan
     ldi tosh, 0
     ldi tosl, 1
     jmp nextopcode
   unotlessthan:
     ldi tosh, 0
     ldi tosl, 0
     jmp nextopcode
 
   litdoc:
     ; db 'Pushes an 8 bit literal value onto the stack'
     ; dw $-lit.doc
   lith: 
     .db LANG            ; domain name prefix 
     .dw ulessthanx     ; link to previous word 
     .db "lit", 3     
   litx:
     push tosh       ; high byte top-of-stack
     push tosl       ; low byte 
     movw zh:zl, pch:pcl    ; get pc program counter 
     lpm r16, z+            ; get literal character (1 byte)
     movw pch:pcl, zh:zl    ; save pc program counter 
     ; sign extend r16 to r17:r16 
     ldi   r17, 127       ; 
     cp    r17, r16       ; C = (r16 < 0)
     sbc   r17, r17       ; r17 = (r16 < 0) ? -1 : 0
     ; actually dont need this step (could load to TOS direct)
     movw tosh:tosl, r17:r16    ; save in top-of-stack 
     jmp nextopcode
 
   litwdoc:
     ; db 'Pushes an 16 bit literal value onto the stack'
     ; dw $-litw.doc
   litwh: 
     .db LANG      ; domain name prefix 
     .dw litx     ; link to previous word 
     .db "litw", 4     
   litwx:
     push tosh       ; high byte top-of-stack
     push tosl       ; low byte 
     movw zh:zl, pch:pcl    ; get pc program counter 
     lpm r16, z+            ; get literal character (1 byte)
     lpm r17, z+            ; get literal character (1 byte)
     movw pch:pcl, zh:zl    ; save pc program counter 
     ; actually dont need this step (could load to TOS direct)
     movw tosh:tosl, r17:r16    ; save in top-of-stack 
     jmp nextopcode
 
   emith:
     .dw dupx       ; the classic dictionary backlink
     .db "emit", 4
   emitx:
    waitagain:
     lds r16, UCSR0A       ; usart control, status register
     sbrs r16, UDRE0       ; is UDR empty?
     rjmp waitagain        ; if not, then just wait 
     sts UDR0, tosl        ; if empty tx character in TOS (r25:r24)k
     pop tosl              ; low byte of TOS 
     pop tosh              ; high byte of TOS (should be zero)
     jmp nextopcode 

   keyh:
     .db INOUT      ; domain name prefix 
     .dw emitx 
     .db "key", 3
   keyx:

   wait:
     push tosh       ; high byte
     push tosl       ; low byte (character)
     lds r16, UCSR0A    ; get usart status info
     sbrs r16, RXC0     ; has a byte been received?
     rjmp wait          ; if not just wait for one.
     clr tosh           ; set high byte of TOS to 0
     lds tosl, udr0     ; get key char into low byte of TOS 
     jmp nextopcode 

   ekeydoc:
     ; db ' ( -- event flag ) '
     ; db 'Get one keyboard event and place on stack'
     ; db 'This includes arrow keys etc. The flag indicates if the '
     ; db 'code represents an extended character (eg arrow key) '
     ; db 'or just an ordinary asci character '
     ; dw $-ekey.doc
   ekeyh: 
     .db INOUT      ; domain name prefix 
     .dw keyx      ; link to prev
     .db "ekey", 4  ; reverse counted string
   ekeyx:
     push tosh       ; high byte top-of-stack
     push tosl       ; low byte
     ldi r16, 0
     push r16        ; just push 0 0 as a no-op
     push r16 
     push r16 
     push r16 
     jmp nextopcode

   getxydoc:
     ; db ' ( -- x y )
     ; db ' Return the x and y position of the cursor. '
     ; dw $-getxy.doc
   getxyh: 
     .db MON          ; monitor domain name prefix 
     .dw ekeyx        ; link to previous
     .db "getxy", 5   ; reverse counted string
   getxyx:
     ; coded as a no-op at the moment 
     ; (can rs232 report cursor position?)
     push tosh       ; high byte top-of-stack
     push tosl       ; low byte
     ldi r16, 0
     push r16        ; just push 0 0 as a no-op
     push r16 
     push r16 
     push r16 
     jmp nextopcode

   ; coded as a no-op (but consumes 2 stack items)jfk
   atxydoc:
     ; db ' ( x y -- ) '
     ; db ' set the x and y position of the cursor. '
     ; dw $-atxydoc
   atxyh: 
     .db MON          ; monitor domain name prefix 
     .dw getxyx   
     .db "atxy", 4 
   atxyx:
     pop tosl       ; 
     pop tosh       ; discard NOS next-on-stack 
     pop tosl       ; low byte
     pop tosh       ; high byte top-of-stack
     jmp nextopcode

   plusdoc:
     ; db 'add the top 2 elements of the stack.'
     ; db ' ( n1 n2 -- n1+n2 ) '
     ; db ' This opcode is agnostic about whether the two 16 bit '
     ; db ' numbers are signed or unsigned. The flags opcode can be '
     ; db ' checked to see if there is an overflow or carry. '
     ; dw $-plus.doc
   plush:
     .db LANG         ; domain name prefix 
     .dw atxyx
     .db "+", 1
   plusx:
     pop r16       ; low byte
     pop r17       ; high byte top-of-stack
     add tosl, r16
     adc tosh, r17     ; add with carry
     jmp nextopcode

   minusdoc:
     ; db 'subtract the top element of stack from next top'
     ; db ' ( n1 n2 -- n1-n2 ) '
     ; dw $-minus.doc
   minush:
     .db LANG         ; domain name prefix 
     .dw plusx
     .db "-", 1
   minusx:
     pop r16       ; low byte
     pop r17       ; high byte top-of-stack
     sub r16, tosl
     sbc r17, tosh  ; subtract with carry
     movw tosh:tosl, r17:r16
     jmp nextopcode

   incrdoc:
     ; db ' ( n -- n+1 )
     ; db 'Increment the top element of the data 
     ; db 'stack by one. '
     ; dw $-incr.doc
   incrh:
     .db LANG         ; domain name prefix 
     .dw minusx 
     .db "1+", 2
   incrx:
     adiw tosh:tosl, 1
     jmp nextopcode

   decrdoc:
     ; db ' ( n -- n-1 )
     ; db 'Decrement top element of the data stack by one. '
     ; dw $-decr.doc
   decrh:
     .db LANG         ; domain name prefix 
     .dw incrx 
     .db "1-", 2
   decrx:
     subi tosl, 1
     sbci tosh, 0
     jmp nextopcode

   negdoc:
     ; db ' ( n -- -n )
     ; db 'Negates the top item of the stack'
     ; dw $-neg.doc
   negh:
     .db LANG         ; domain name prefix 
     .dw decrx 
     .db "neg", 3
   negx:
     ; seems dodgy
     neg tosl 
     sbci tosh, 0
     jmp nextopcode

   ; high 16bits (msbyte) is on top of stack in forth (?)
   dplusdoc:
     ; db 'add the 2 double numbers '
     ; db ' ( d D -- d+D ) '
     ; db ' This opcode is agnostic about whether the two 16 bit '
     ; db ' numbers are signed or unsigned. The flags opcode can be '
     ; db ' checked to see if there is an overflow or carry. '
     ; dw $-dplus.doc
   dplush:
     .db LANG         ; domain name prefix 
     .dw negx
     .db "d+", 2
   dplusx:
     ; TOS (dx) has high word of first double
     pop r16 
     pop r17     ; NOS (low word of forth double number)
     pop r18 
     pop r19     ; NNOS (high word of 2nd forth double number)
     pop r20 
     pop r21     ; NNOS (low word of 2nd forth double number)
     add r16, r20
     adc r17, r21
     adc tosl, r18
     adc tosh, r19
     push r17
     push r16    ; save low word of double result on stack (nos)
     jmp nextopcode

   logordoc:
     ; db ' ( n1 n2 -- n1 V n2 )
     ; db 'the logical OR of n1 and n2'
     ; dw $-logor.doc
   logorh:
     .db LANG         ; domain name prefix 
     .dw negx 
     .db "or", 2
   logorx:
     pop r16 
     pop r17 
     or tosl, r16
     or tosh, r17
     jmp nextopcode

   logxordoc:
     ; db ' ( n1 n2 -- n1 V n2 )
     ; db 'the logical or of n1 and n2'
     ; dw $-logxor.doc
   logxorh:
     .db LANG         ; domain name prefix 
     .dw logorx 
     .db "xor", 3
   logxorx:
     pop r16        ; next-on-stack low byte
     pop r17        ; next-on-stack high byte
     eor tosl, r16
     eor tosh, r17
     jmp nextopcode

   loganddoc:
     ; db ' ( n1 n2 -- n1 && n2 )
     ; db 'the logical and of n1 and n2'
     ; dw $-logand.doc
   logandh:
     .db LANG         ; domain name prefix 
     .dw logxorx 
     .db "and", 3
   logandx:
     pop r16 
     pop r17 
     and tosl, r16
     and tosh, r17
     jmp nextopcode

   ; /mod is going to be slow on an 8 bit micro
   divmoddoc:
     ; db '(n1 n2 - remainder quotient) '
     ; db ' divide n1 by n2 and provide remainder and quotient. '
     ; db ' n2 is the top item on the stack '
     ; dw $-divmod.doc
   divmodh:
     .db LANG         ; domain name prefix 
     .dw logandx 
     .db "/mod", 4
   divmodx:
     ; maybe could optimise by doing div dx ??
     ;mov bx, dx  ; divisor is top of stack. tos=dx, 
     ;xor dx, dx  ; set dx := 0
     ;pop ax      ; dividend is next element
     ;div bx      ; does dx:ax / bx remainder->dx; quotient->ax
     ;push dx     ; put remainder on stack
     ;mov dx, ax  ; put quotient on top of stack
     jmp nextopcode

   timestwodoc:
     ; db '(n1 -- n1*2 ) '
     ; db ' double n1. This basically performs a '
     ; db ' left shift on the bits in n1 '
     ; dw $-timestwo.doc
   timestwoh:
     .db LANG         ; domain name prefix 
     .dw divmodx 
     .db "2*", 2
   timestwox:
     lsl tosl
     rol tosh
     jmp nextopcode

   multdoc:
     ; db '(n1 n2 -- n1*n2 ) '
     ; db ' signed multiplication '
     ; dw $-mult.doc
   multh:
     .db LANG         ; domain name prefix 
     .dw timestwox 
     .db "*", 1
   multx:
     ;pop ax       
     ;imul dx       ; do dx:ax := ax*dx 
     ;mov dx, ax    ; result in top of stack, tos=dx   
     jmp nextopcode

   umultdoc:
     ; db '(n1 n2 -- n1*n2 ) '
     ; db ' unsigned multiplication '
     ; dw $-umult.doc
   umulth:
     .db LANG         ; domain name prefix 
     .dw multx 
     .db "u*", 2
   umultx:
     ;pop ax       
     ;mul dx        ; do dx:ax := ax*dx 
     ;mov dx, ax    ; result in top of stack, tos=dx   
     jmp nextopcode

   fcalldoc:
     ; .db 'Call a virtual proceedure on the bytecode stack machine'
     ; .db 'The current code pointer is saved - pushed onto the '
     ; .db 'return stack at pointer Y or rph:rpl (r1:r0) starting at '
     ; .db 'RAMEND-512 and the address '
     ; .db 'of the virtual proc to execute is loaded into the Z register '
     ; .dw $-fcall.doc
   fcallh:
     .db LANG         ; remove these domain name prefixes!
     .dw umultx
     .db "fcall", 5
   fcallx:
     movw zh:zl, pch:pcl    ; get pc program counter 
     lpm r16, z+            ; get low byte of call address 
     lpm r17, z+            ; get high byte of call address 
     ; Is this the problem !!! Return stack is NOT "Y" !!!
     st Y+, zl 
     st Y+, zh              ; save PC program counter on Rstack
     movw pch:pcl, r17:r16  ; load new program counter value 
     jmp nextopcode

   pcalldoc:
     ; db ' ( xt -- ) '
     ; db ' Call a procedure using the top element on '
     ; db ' the data stack as the execution address. This '
     ; db ' allows the implementation of function pointers'
     ; db ' In standard forths this is called "execute". '
     ; dw $-pcall.doc
   pcallh:
     .db LANG         ; domain name prefix 
     .dw fcallx
     .db "pcall", 5    ; or call it exec/ex/execute
   pcallx:
     ; is Y the rstack pointer????? bug??
     st Y+, pcl 
     st Y+, pch             ; save PC program counter on Rstack
     movw pch:pcl, tosh:tosl; load new program counter value 
     pop tosl
     pop tosh
     jmp nextopcode

   exitdoc:
     ; db 'exit a virtual procedure by restoring the program counter '
     ; db 'from the Rstack '
     ; dw $-exit.doc
   exith:
     .db LANG         ; domain name prefix 
     .dw pcallx 
     .db "exit", 4
   exitx:
     ; Is Y really the return stack pointer??????
     ld pch, -Y     ; get call return address from top of Rstack 
     ld pcl, -Y     ; 
     jmp nextopcode

   ljumpdoc:
     ; db 'jumps to a relative virtual instruction.'
     ; db ' The jump is given in the next 2 bytes'
     ; dw $-ljump.doc
   ljumph: 
     .dw exitx        ; link to prev
     .db "ljump", 5   ; reverse count
   ljumpx:
     movw zh:zl, pch:pcl    ; get ip 
     lpm r16, z+            ; get jump
     lpm r17, z+            ; get jump
     subi pcl, 2
     sbci pch, 0      ; adjust offset to JUMP instruction 
     movw pch:pcl, r17:r16    ; load new PC program counter
     jmp nextopcode 

   jumpdoc:
     ; db 'jumps to a relative virtual instruction.'
     ; db ' The relative jump is given in the next byte.'
     ; db ' eg: JUMP, -2, jumps back 2 instructions in the bytecode' 
     ; db ' eg: LIT, '*', EMIT, JUMP, -3, ' 
     ; db '  prints a never-ending list of asterixes '
     ; dw $-jump.doc
   jumph: 
     .dw ljumpx      ; 
     .db "jump", 4   ; reverse count
   jumpx:
     movw zh:zl, pch:pcl    ; get ip 
     lpm r16, z             ; get jump
     ; sign extend
     ldi r17, 127    ; 
     cp r17, r16     ; C = (tosl < 0)
     sbc r17, r17    ; tosh = (tosl < 0) ? -1 : 0
     ; add or subtract jump offset from PC program counter  
     ; can only use sbiw with r24,26,28,30
     subi pcl, 1
     sbci pch, 0      ; adjust offset to JUMP instruction 
     add pcl, r16
     adc pch, r17
     jmp nextopcode 

   jumpzdoc:
     ; db ' ( n -- )
     ; db 'jumps to a relative virtual instruction if top '
     ; db 'stack element is zero. The flag value is removed '
     ; db 'from the stack
     ; db ' The relative jump is given in the next byte.'
     ; db ' eg: JUMPZ, -2, jumps back 2 instructions in the bytecode' 
     ; db ' eg: KEY, DUP, EMIT, LIT, '0', MINUS, JUMPNZ, -6 ' 
     ; db '  allows the user to type until zero is pressed. '
     ; dw $-jump.doc
     ; handle jumps by modifying virtual ip (in this case SI)
   jumpzh: 
     .dw jumpx       ; link to prev
     .db "jumpz", 5   ; reverse count
   jumpzx:
     movw zh:zl, pch:pcl    ; get ip 
     lpm r16, z+            ; get jump
     movw pch:pcl, zh:zl    ; save ip (if no jump)
     cpi tosl, 0     ; check TOS for zero
     brne exitjumpz
     cpi tosh, 0     ; check TOS for zero
     brne exitjumpz
     ; sign extend
     ldi r17, 127    ; 
     cp r17, r16     ; C = (tosl < 0)
     sbc r17, r17    ; tosh = (tosl < 0) ? -1 : 0
     subi pcl, 2
     sbci pch, 0      ; adjust offset to JUMP instruction 
     ; add or subtract jump offset from PC program counter  
     add pcl, r16
     adc pch, r17
   exitjumpz:
     pop tosl
     pop tosh
     jmp nextopcode 

   jumpnzdoc:
     ; db ' ( n -- )
     ; db 'jumps to a relative virtual instruction if top stack element '
     ; db ' is not zero. Top stack item is removed.
     ; db ' The relative jump is given in the next byte.'
     ; db ' eg: JUMPNZ, -2, jumps back 2 instructions in the bytecode' 
     ; db ' eg: KEY, DUP, EMIT, LIT, 'q', MINUS, JUMPNZ, -6 ' 
     ; db '  allows the user to type until "q" is pressed. '
     ; dw $-jumpnz.doc
   jumpnzh: 
     .dw jumpzx       ; link to prev
     .db "jumpnz", 6   ; reverse count
   jumpnzx:
     movw zh:zl, pch:pcl    ; get ip 
     lpm r16, Z+            ; get jump
     movw pch:pcl, zh:zl    ; save ip (if no jump)
     ; logic may not be working! especially for zero "non-jumps"
     cpi tosl, 0     ; check TOS for zero
     brne notzerotosl 
     cpi tosh, 0     ; check TOS for zero
     breq exitjumpnz
   notzerotosl:
     ; sign extend
     ldi r17, 127    ; 
     cp r17, r16     ; C = (tosl < 0)
     sbc r17, r17    ; tosh = (tosl < 0) ? -1 : 0
     subi pcl, 2     ; PC-2 to realign to JUMPNZ
     sbci pch, 0     
     ; add or subtract jump offset from PC program counter  
     add pcl, r16
     adc pch, r17
   exitjumpnz:
     pop tosl
     pop tosh
     jmp nextopcode 

   rloopdoc:
     ; db ' ( R: n -- n-1 ) '
     ; db ' Decrements loop counter on return stack and jumps to '
     ; db ' target if counter > 0 '
     ; db ' like the x86 loop instruction this is a pre-decrement '
     ; db ' so a loop counter of 2 will loop twice. The disadvantage '
     ; db ' is that a loop counter of 0 will loop 2^16 times. '
     ; dw $-rloop.doc
   rlooph: 
     .dw jumpnzx       ; link to prev
     .db "rloop", 5     ; reverse count
   rloopx:
   ;  ; handle loops by modifying virtual ip (in this case SI)
   ;  xor ax, ax      ; set ax := 0
   ;  lodsb           ; al := [si]++ get relative loop target into AL
     ; check return stack for zero, if so continue with next 
     ; instruction (dont jump/loop)
   ;  mov bx, [es:di-2] ; get top return stack item into bx
   ;  dec bx          ; decrement the loop counter on the return stack
   ;  cmp bx, 0       ; if bx != 0 continue
   ;  mov [es:di-2], bx ; update the counter
   ;  je .exit        ; the only difference with jumpz !
   ;  cbw             ; convert signed byte al to signed word ax (neg offset)
   ;  sub si, 2       ; realign si to JUMP instruction, 
   ;  add si, ax      ; adjust the si code pointer by jump offset
   ;.exit:
     jmp nextopcode

   divtwodoc:
     ; db '(n1  - n1/2) '
     ; db ' divide n1 by 2 '
     ; dw $-divtwo.doc
   divtwoh:
     .db LANG         ; domain name prefix 
     .dw logandx 
     .db "/2", 2
   divtwox:
     lsr tosh
     ror tosl
     jmp nextopcode

   ;
   readdoc:
     ; db ' ( 1st-sector n addr -- flag=T/F ) 
     ; db ' reads n sectors from disk starting at sector to memory addr '
     ; db ' returns 0 on failure, 1 on success. But on a microcontroller '
     ; db ' system, this is not really necessary '
     ; dw $-readdoc
   readh: 
     .db DISK         ; domain name prefix 
     .dw rloopx      
     .db "read", 4
   readx:
     ; just ignore stack parameters
     pop tosl
     pop tosh
     pop tosl
     pop tosh
     pop tosl
     pop tosh
     clr tosl
     clr tosh
     jmp nextopcode 

   writedoc:
     ; db ' ( first-sector n addr -- flag=T/F/2 ) 
     ; db ' write sectors to disk. possibly not necessary on a '
     ; db ' microcontroller '
     ; dw $-write.doc
   writeh: 
     .db DISK         ; domain name prefix 
     .dw readx      
     .db "write", 5    
   writex:
     ; just ignore stack parameters
     pop tosl
     pop tosh
     pop tosl
     pop tosh
     pop tosl
     pop tosh
     clr tosl
     clr tosh
     jmp nextopcode 

   ; This opcode and vid.x are obviously not going to be 
   ; available on all hardware. So we need to think about
   ; how to configure plugable opcodes. Eg: what if a device
   ; has a gyroscope. We want opcodes to read from that gyroscope
   fgdoc:
     ; db ' ( n -- ) '
     ; db ' set foreground colour for emit'
     ; dw $-fg.doc
   fgh: 
     .db MON          ; domain name prefix 
     .dw writex      
     .db "fg", 2    
   fgx:
     pop tosl
     pop tosh
     jmp nextopcode

   ;fg.d: db 5   ; colour cyan

   bgdoc:
     ; db ' ( n -- ) '
     ; db ' set the background colour for emit'
     ; dw $-bg.doc
   bgh: 
     .db MON          ; domain name prefix 
     .dw fgx      
     .db "bg", 2    
   bgx:
     pop tosl
     pop tosh
     jmp nextopcode

   ; data element to store current background colour
   ;bg.d: db 5    ; 

   ; how to clear the serial screen ??
   clsdoc:
     ; db ' ( -- ) '
     ; db ' clear screen'
     ; dw $-cls.doc
   clsh: 
     .db MON          ; domain name prefix 
     .dw bgx      
     .db "cls", 3    
   clsx:
     jmp nextopcode

   
   viddoc:
     ; db ' ( n -- ) '
     ; db ' set video mode to n'
     ; db ' on x86 try 13h mode'
     ; dw $-vid.doc
   vidh: 
     .db MON          ; domain name prefix 
     .dw clsx      
     .db "vid", 3    
   vidx:
     pop tosl
     pop tosh
     jmp nextopcode

   ; This opcode may not have much meaning on a microcontroller
   ; with no display. This raises the concern: we need "pluggable"
   ; opcodes which are installable/uninstallable depending on the 
   ; configuration of the hardware.
   pixdoc:
     ; db ' ( x y -- ) '
     ; db ' display one pixel at position xy'
     ; dw $-pix.doc
   pixh: 
     .db MON          ; domain name prefix 
     .dw vidx      
     .db "pix", 3    
   pixx:
     pop tosl
     pop tosh
     pop tosl
     pop tosh
     jmp nextopcode

   glyphdoc:
     ; db ' ( a -- ) '
     ; db ' display a ppm (?) glyph at pixel position xy'
     ; dw $-glyph.doc
   glyphh: 
     .db MON          ; domain name prefix 
     .dw pixx      
     .db "glyph", 5    
   glyphx:
     jmp nextopcode


   ; this may be useful for timing code
   ; on x86 ...
   ; clock updates at 1193180/65536 (about 18.2) ticks per second.
   ; counts per second  18
   ; counts per minute  1092
   ; counts per hour    65543
   ; counts per day     1573040
   ; clock incremented approx every 55ms
   clockdoc:
     ; db ' ( -- D ) '
     ; db ' number of clock ticks since midnight '
     ; dw $-clock.doc
   clockh: 
     .db TIME          ; domain name prefix 
     .dw glyphx      
     .db "clock", 5    
   clockx:
     jmp nextopcode

   ; There are "issues" here. It is not always possible to 
   ; set format for rtc time, so need to check format from status
   ; register B and convert if necessary. Also, should check for
   ; 2 values the same in a loop, so overcome updating problems
   rtcdoc:
     ; db ' ( -- secs mins hours days months years ) '
     ; db 'return 6 values on stack representing real time and date. '
     ; db 'called time&date in standard forths. '
     ; dw $-rtc.doc
   rtch: 
     .db TIME          ; domain name prefix 
     .dw clockx      
     .db "rtc", 3    
   rtcx:
     jmp nextopcode
 
   noopdoc:
     ; db 'Does nothing. For some reason most machines '
     ; db 'include this instruction. Also it is a good '
     ; db 'end marker for the opcodes '
     ; dw $-noop.doc
   nooph: 
     .db LANG          ; domain name prefix 
     .dw rtcx      
     .db "nop", 3    
   noopx:
     jmp nextopcode

   starh:
    .dw noopx
    .db "star", 4
   starx:
    waithere:
     lds r16, UCSR0A       ; usart control, status register
     sbrs r16, UDRE0       ; is UDR empty?
     rjmp waithere         ; if not, then just wait 
     ldi r17, '*'    
     sts UDR0, r17         ; if empty tx next character
     jmp nextopcode 
     
   ; a primitive method of debugging 
   ; prints one digit from the top stack item
   digith:
    .dw starx
    .db ".digit", 6
   digitx:
   repeat:
     lds r16, UCSR0A    ; usart control, status register
     sbrs r16, UDRE0    ; is UDR empty?
     rjmp repeat        ; if not, then just wait 
     movw r17:r16, tosh:tosl      
     ;ldi tosl, 5
     adiw tosl, '0'     ; convert opcode to asci digit
     sts UDR0, tosl     ; if empty tx next character
     movw tosh:tosl, r17:r16
     jmp nextopcode 

   serialx:
     ldi r16,(1<<txen0)|(1<<rxen0)     ; enable transmit receive 
     sts ucsr0b, r16       
     ldi r16,(1<<ucsz01)|(1<<ucsz00)
     sts ucsr0c, r16      
     ldi r16,0x66    ; the baud rate, 9600 on 16 megaherz chip
     sts ubrr0l,r16
     ldi r16,0x00    ; the baud rate, 9600
     sts ubrr0h,r16
     ret

     ; some bytecode to test the system. Each .db must be even numbered
     ; except for that which ends in exit
   proc:

     .db STAR, FCALL, low(proc2<<1), high(proc2<<1)
     .db STAR, LIT, 'P', EMIT, LIT, '1', EMIT, EXIT
   proc2:
     ; if I put a key emit here, something strange happens
     .db STAR, LIT, 'P', EMIT, LIT, '2', EMIT, EXIT
   code:
     ; test fcall and exit (print 'r') nearly working! 
     .db STAR, FCALL, low(proc<<1), high(proc<<1), KEY, EMIT
     ; test ron (print 'r')
     .db STAR, LIT, 'r', RON, ROFF, EMIT, KEY, EMIT 
     ; test depth (print '4')
     .db STAR, LIT, 'a', LIT, 'b', DEPTH
     .db LIT, '0', PLUS, EMIT, KEY, EMIT
     ; test neg (print 'N')
     .db STAR, STAR, LIT, -78, NEGATE, EMIT, KEY, EMIT
     ; test plus (print '+')
     .db STAR, STAR, LIT, 33, LIT, 10, PLUS, EMIT, KEY, EMIT
     ; test minus (print '-')
     .db STAR, STAR, LIT, 56, LIT, 11, MINUS, EMIT, KEY, EMIT
     ; test decr (print '1')
     .db STAR, STAR, LIT, '2', DECR, EMIT, KEY, EMIT
     ; test lit (print '$')
     .db STAR, LIT, '$', EMIT, KEY, EMIT
     ; test lit (print 'B')
     .db STAR, LIT, 66, EMIT, KEY, EMIT
     ; test litw (print 'W')
     .db STAR, STAR, LITW, 'W', 0, EMIT, KEY, EMIT
     ; test divtwo /2 ... print 'B'
     .db STAR, STAR, STAR, LITW, 132, 0, DIVTWO, EMIT, KEY, EMIT
     ; test jumpf
     .db STAR, STAR, KEY, DUP, EMIT, LIT, "q", EQUALS, JUMPF, -7
     .db STAR, KEY, DUP, EMIT, LIT, "q", EQUALS, JUMPF, -7
     .db STAR, KEY, DUP, INCR, EMIT, DECR, EMIT, JUMP, -7, 0
     .db STAR, KEY, DUP, EMIT, JUMPNZ, -4, STAR, 0 
     .db STAR, KEY, EMIT, JUMP, -3, STAR, STAR, 0

   wakeup:
    ; initialise the data stack at the top of ram
    ldi r21, high(ramend)  
    out sph, r21        
    ldi r21, low(ramend)    
    out spl, r21        
    ; using Y (yh:yl) double reg as pointer to return stack
    ; Rstack grows up towards Dstack. Rstack is post-increment and
    ; pre-decrement.
    ; dodgy!!
    ldi yh, high(ramend-512)
    ldi yl, low(ramend-512)
   
    call serialx    ; set up serial interface

    ; r25:r24 holds top of stack
    ; r23:r22 is the PC program counter
    ; put all this in an exec function.
    ; lpm=load program memory, must use Z register
    ldi pch, high(code<<1) 
    ldi pcl, low(code<<1)

   nextopcode:
    ; retrieve and save PC program counter to r23:r22
    movw zh:zl, pch:pcl     ; get ip 
    lpm r20, z+
    movw pch:pcl, zh:zl     ; save ip 

   ; call digitx

   checkzero:
    ; halt if opcode is zero
    cpi r20, 0 
    breq goodbye 

    ldi zh, high(optable<<1) 
    ldi zl, low(optable<<1)
    add zl, r20    ; opcode number (need to double?)
    add zl, r20    ; add op*2 offset to optable: 
    lpm r16, z+    ; get op executable address from optable
    lpm r17, z+
    movw zh:zl, r17:r16  ; load exec address to Z for the indirect jump 
    ijmp

    ; Each opcode in turn "calls" (actually "jumps") to "nextopcode 
    ; So "goodbye" is only reached once the 0 opcode is encountered.

    ; The commented line below is unnecessary to make the "inner interpreter
    ; loop".
    ; jmp nextopcode

   goodbye:

    lds r16, UCSR0A     ; usart control, status register
    sbrs r16, UDRE0     ; is UDR empty?
    rjmp goodbye        ; if not, then just wait 
    ldi r20, '#'
    sts UDR0, r20       ; if empty tx character

   here: rjmp here