&& Developing an Operating System 

This is a booklet about the process of developing an operating system. 
First a 512 byte bootloader is created which performs a simple function
such as printing a message or getting a key from the keyboard, then
that bootloader is used to load and execute a larger os 'kernel'.
Really, the snippets here, just contain information about writing
simple 'bootable' programs for x86 computers. The source code is
in x86 assembler and was inspired by the Mikeos code

  == main bios interrupt functions
  .. int 13h, disk access (via sectors, cylinders etc)
  ..

GETTING HELP

  wiki.osdev.org is a good site
    
THE BOOT PROCESS

  The computer powers on and starts executing the bios. The 
  bios then looks for a bootable sector (512 bytes) on a 
  suitable 'boot medium'. That could be a old floppy disk (in days
  gone by) or now a USB memory stick, CD, or hard-drive. The
  boot medium is supposed to contain a 'boot signature' which is 
  just a couple of bytes with specific numbers in them. This is 
  to ensure that the computer doesnt attempt to boot something
  which is not supposed to be booted.

  http://board.flatassembler.net/topic.php?p=124387
    interesting information about booting from usb by mike gonta

STEP BY STEP

 This section describes, step by step how to create a bootable
 x86 usb key

 * install the correct programs
 >> sudo apt-get install qemu-system-i386 nasm mkdosfs ... 

 * make a new floppy image called 'os.flp'
 >> mkdosfs -C os.flp 1440

 The above line will not overwrite and existing file.
 Another way is to copy an existing floppy disk image.

 * compile the assembler source into a flat binary executable
 >> nasm -f bin -o first.bin first.asm
 >> nasm -o first.bin first.asm  ##(probably the same)

 The 'bin' format is the default for the nasm assembler.

 * insert the compiled kernel 'first.bin' into the floppy image
 >> dd status=noxfer conv=notrunc if=first.bin of=os.flp

 * boot the operating system in the qemu virtual machine
 >> qemu -fda os.flp
 >> qemu-system-i386 -fda os.flp
 
 * create an iso file which can be burnt to a cd in the 'cdiso' folder
 >> mkisofs -o myfirst.iso -b myfirst.flp cdiso/

 Use 'df' or 'dmesg' to find out the device name of a usb key which
 you have inserted eg '/dev/sdc'

 * unmount the usb key
 >> umount /dev/sdc

 WARNING: the following command will delete all previous data on
 the usb memory stick.

 When you execute the command below, the little light on the 
 usb memory stick should flash a few times, indicating that 
 data is being written to the stick.

 * write the new operating system to the boot sector of the usb key
 >> sudo dd if=os.flp of=/dev/sdc
 >> su; dd if=os.flp of=/dev/sdc  ##(on a non debian system)

 Be Very, Very careful where you write the floppy image
 file to. If you write it to your hard-disk (for example /dev/hda) that is
 more or less the end of the data and operating system on that
 hard-disk.

 If the usb memory stick dev has a number, dont use the number, just
 the letters of the device name eg 'sdc1' becomes 'sdc' (remove 
 the number 1 from the name).

 The usb memory stick can now be used to boot the new operating system
 by changing the computer boot order in the bios. Eg press <esc> on an
 asus eee pc or the ibm key on a thinkpad

MEMORY ADDRESSES IN X86 READ MODE

  real mode addresses are 20 bits but are made up of a 16bit segment
  address, with a 16bit offset. Its not that complicated. The 
  segment address, which needs to be loaded into ds or another
  segment register with something
  like
    mov ax 07C0h
    mov ds ax
  is really the address 07C00h. That means that it is a 20 bit
  address that can only address 1 megabyte of memory, no more. Hence
  the one of the needs for protected mode...

SIMPLE BOOT PROGRAM 

  Apparently the boot sector from a floppy (or usb) or hard-disk
  is always loaded to the physical memory location 07C00h which
  corresponds to the 'segment' (in real mode) 07C0h. The code 
  below seems to have a problem with register indirect jumps
  (code segment not set properly)

  * a simple example of a bootable program with stack and a function
  -----------------
 BITS 16

start:
  mov ax, 07C0h     ; Set up 4K stack space after this bootloader
  add ax, 288       ; (4096 + 512) / 16 bytes per paragraph
  mov ss, ax        ; this creates a 4K gap between stack and code 
  mov sp, 4096
  mov ax, 07C0h     ; Set data segment to where we're loaded
  mov ds, ax

  mov si, text_string     ; Put string position into SI
  call print_string       ; Call our string-printing routine

  jmp $                   ; Jump here - infinite loop!

  text_string db 'This is my cool new OS!!!', 0

  print_string:       ; Routine: output string in SI to screen
    mov ah, 0Eh       ; int 10h 'print char' function

  .repeat:
    lodsb             ; Get character from string
    cmp al, 0
    je .done          ; If char is zero, end of string
    int 10h           ; Otherwise, print it
    jmp .repeat

  .done:
    ret

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
 ,,,

GOTCHAS ....

  Some bioses require a short jump (+/-128 bytes) followed
  by a 'nop' no operation instruction in order to execute 
  even though there isnt really a logical reason for this.

  * start the boot sector like this
  -------
    jmp short start
    nop
    start:
  ,,,

  * ax cant be used as an index register
  -------
    mov bx, [ax]
    ; error !! you cant get some value out of memory by using
    ; an index stored in ax, use bx or si or di instead
    mov ax, [bx]   ; OK this works
  ,,,

BOOTLOADERS 

  https://github.com/cirosantilli/x86-bare-metal-examples
    good examples

  http://stackoverflow.com/questions/22054578/how-to-run-a-program-without-an-operating-system/32483545#32483545
    good knowledgable stuff about bootloading and io

  The initial bootable program may only be 512 bytes long since
  it must fit into 1 sector of the 'floppy'. This is limiting.
  The answer is to use these 512 bytes to load a bigger program
  into memory and jump to it. The code below shows how.
  
  Sector 2, head 0, cylinder 0, is the sector (512 bytes) 
  immediately following the sector occupied by the boot 
  program, which contains the code we want to execute.

  
  After Booting the DL register may contain the number of 
  the boot media. For example for a usb memory stick on my
  asus eee pc DL=128. this number
  should be saved for use with the read write functions of INT 13h

  * a simple working bootloader 
  -----------------
   BITS 16

   jmp start
     drive db 0      ; a variable to hold boot drive number
   start:
     mov ax, 07C0h   ; Set data segment to where we're loaded
     mov ds, ax
     mov [drive], dl ; save the boot drive number
     mov ax, 07C0h   ; Set up 4K stack space after this bootloader
     add ax, 288     ; (4096 + 512) / 16 bytes per paragraph
     mov ss, ax      ; with a 4K gap between stack and code
     mov sp, 4096

      ; save the DL register or else dont modify it
      ; it contains the number of the boot medium (hard disk,
      ; usb memory stick etc)
      ; The 'floppy' Drive is NOT necesarily 0!!!

    reset:            ; Reset the floppy drive
      mov ax, 0       ; 
      mov dl, [drive] ; the boot drive number (eg for usb 128)
      int 13h         ;
      jc reset        ; ERROR => reset again
    read:
      mov ax, 1000h       ; ES:BX = 1000:0000
      mov es, ax          ; es:bx determines where data loaded to
      mov bx, 0           ;
      mov ah, 2           ; Load disk data to ES:BX
      mov al, 5           ; Load 5 sectors (only 1 used here)
      mov ch, 0           ; Cylinder=0
      mov cl, 2           ; Sector=2 (sector 1 is the boot sector)
      mov dh, 0           ; Head=0
      mov dl, [drive]     ; 
      int 13h             ; Read!
    jc read             ; ERROR => Try again

    jmp 1000h:0000      ; Jump to the loaded code 

    times 510-($-$$) db 0   ; pad out the boot sector (512 bytes)
    dw 0AA55h               ; end with standard boot signature

    ; this is important for memory offset calculations
    ; or compile next stage separately
    section stage2 vstart=0

    ; the code to be loaded and executed
      mov ah, 0x0A 
      mov al, '!'
      mov cx, 10
      int 10h

    hang: jmp hang
  ,,,
 

  The only difference in the code below is that the loaded
  program is contained in a separate file, which is handy
  for organisational reasons.

  
  * another way of writing the bootloader, almost identical 
  -----------------------------
    ; 3.ASM
    ; Load a program off the disk and jump to it

    ; Tells the compiler that this is offset 0.
    ; It isn't offset 0, but it will be after the jump.
    [ORG 0]

      jmp 07C0h:start     ; Goto segment 07C0

    start:
      push dx   ; save the boot medium drive number
      ; Update the segment registers
      mov ax, cs
      mov ds, ax
      mov es, ax

    reset:            ; Reset the floppy drive
      ; drive number in DL, unmodified since boot 
      mov ax, 0       ;
      int 13h         ;
      jc reset        ; ERROR => reset again
    read:
      mov ax, 1000h       ; ES:BX = 1000:0000
      mov es, ax          ;
      mov bx, 0           ;
      mov ah, 2           ; Load disk data to ES:BX
      mov al, 5           ; Load 5 sectors
      mov ch, 0           ; Cylinder=0
      mov cl, 2           ; Sector=2
      mov dh, 0           ; Head=0
      ; drive number in DL, unmodified since boot 
      int 13h             ; Read!

      jc read             ; ERROR => Try again
      jmp 1000h:0000      ; Jump to the program

    times 510-($-$$) db 0
    dw 0AA55h

  This is a small loadable program.

    ; PROG.ASM
      mov ah, 9
      mov al, '='
      mov bx, 7
      mov cx, 10
      int 10h

    hang: jmp hang

  This program creates a disk image file that contains both
  the bootstrap and the small loadable program.

    ; IMAGE.ASM
    ; Disk image

    %include '3.asm'
    %include 'prog.asm' 
 ,,,

  The code below doesnt modify the DL register which contains
  the drive number of the boot medium immediately after boot.
  (the bios places it there). It would be better and safer
  to save DL for use with the int 13h read/write functions

  * a boot loader which shows what its up to
  -----------------
   BITS 16
   jmp start
   %include 'prints.asm'
   %include 'printi8.asm'
   m.reset db 'resetting floppy',13,10,0
   m.read db 'reading sector 2 of floppy',13,10,0
   m.dlstate db 'dl is ',0
   start:
      mov ax, 07C0h   ; Set up 4K stack space after this bootloader
      add ax, 288     ; (4096 + 512) / 16 bytes per paragraph
      mov ss, ax
      mov sp, 4096
      mov ax, 07C0h   ; Set data segment to where we're loaded
      mov ds, ax

      mov si, m.dlstate
      call prints
      mov bl, 10
      mov al, dl
      call printi8

    mov cx, 4         ; try to reset drive 4 times
    .reset:            ; Reset the floppy drive
      mov si, m.reset
      call prints
      mov ax, 0      ;
      ;mov dl, 0     ; Drive=0 (=A), no! use the DL value after boot
      int 13h          
      jnc .startread
      loop .reset      ; on error (carry flag) reset again 3 times
    .startread:
      mov cx, 4        ; try to read 4 times
    .read:
      mov si, m.read
      call prints
      mov ax, 1000h       ; ES:BX = 1000:0000
      mov es, ax          ; es:bx determines where data loaded to
      mov bx, 0           ;
      mov ah, 2           ; Load disk data to ES:BX
      mov al, 5           ; Load 5 sectors (only 1 used here)
      mov ch, 0           ; Cylinder=0
      mov cl, 2           ; Sector=2 (sector 1 is the boot sector)
      mov dh, 0           ; Head=0
      ;mov dl, 0           ; Drive=0, 'floppy' (or usb memory stick)
      int 13h             ; Read!
      jnc .done
      loop .read        ; on error (carry flag) try again 3 times
    .done: 
    jmp 1000h:0000      ; Jump to the loaded code 

    jmp $
    times 510-($-$$) db 0
    dw 0AA55h

    ; the code to be loaded and executed
    jmp start2
    m.loaded db 'loaded data!',13,10,0
    start2:
      mov ah, 0x0A 
      mov al, '!'
      mov cx, 10
      int 10h

    hang: jmp hang

  ,,,


  ; boot1.asm   stand alone program for floppy boot sector
  ; Compiled using            nasm -f bin boot1.asm
  ; Written to floppy with    dd if=boot1 of=/dev/fd0

MULTIBOOT AND GRUB ....

  Multiboot is a file format invented by grub to overcome
  limitations of master boot record mbr format. It allows
  booting a file from the file system, and booting several
  oses on the one computer.

  This is worth investigating to allow the toy.os to co-exist
  peacefully with other systems.

REBOOTING ....

   * reboot the computer by jumping to FFFF:0
   ------------------

   ; Boot record is loaded at 0000:7C00 ie CS==0 & IP==7c00
   org 7c00h
   
   lea si,[msg]   ; load message address into SI register:
   mov ah,0eh
 print:  
   mov al,[si]         
   cmp al,0         
   jz done     ; zero byte at end of string
   int 10h     ; write character to screen.    
   inc si         
   jmp print

   done:  
     mov ah,0    ; wait for any key:
     int 16h     ; waits for key press

   ; store magic value at 0040h:0072h to reboot:
   ;       0000h - cold boot.
   ;       1234h - warm boot.
   mov  ax,0040h
   mov  ds,ax
   mov  word[0072h],0000h   ; cold boot.
   jmp  0ffffh:0000h        ; reboot!

   msg  db  'welcome, i have control of the computer.',13,10
        db  'press any key to reboot.',13,10
        db  '(after removing the floppy)',13,10,0

   ,,,

  * reboot the computer, but this may lock up the computer.
  >> int 19h

  * reboot the computer after a user keypress with int 19h, may lock!
  -------------
    mov ah, 0     ; x86 bios wait for keypress function
    int 16h
    mov ah, 0eH   ; echo the key just pressed 
    int 10H
    int 19h       ; reboot the computer

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; PC boot signature
  ,,,

SEGMENTS

  data in segments can be accessed with the syntax ss:bx ss:cx etc
  The if no segment is specified then all mov, lodsb etc are relative
  to ds - the data segment.

  * move the low byte on top of stack into al  
  ------------
     mov bx, sp
     mov al, [ss:bx]
  ,,,
  
STACK SEGMENT ....

  The stack segment register is used to calculate offsets
  into the stack used for PUSH and POP instructions. It
  appears to be automatically initialized, but we can 
  initialize it explicitly if we need a big stack etc.


  * access data in the ss stack segment
  ------------
   BITS 16
   [ORG 0]

   cr equ 13   ;  carriage return
   lf equ 10   ;  form feed 

   jmp 07C0h:start     ; Goto segment 07C0
   
   ; print low byte character on top of stack
   ; not including the fn return pointer
   test:
     dw 0
     db 4, 'test'
   test.x:
     mov bx, sp
     mov al, [ss:bx+2]
     mov ah, 0eH         
     int 10H
     ret

   start:

     mov ax, cs    ; make data segment and es same as code segment
     mov ds, ax
     mov es, ax
    
     push '*' 
     call test.x 
     here: jmp here          ; loop forever 
     times 510-($-$$) db 0   ; Pad remainder of MBR boot sector with 0s
     dw 0xAA55               ; The standard MBR boot signature
   ,,,
 ,,,,

DATA SEGMENT ....

  The DS or data segment register needs to be initialized
  before accessing variable with [var] since the offset of 
  these variables are calculated relative to value in the 
  DS register.

  The following 2 lines are sufficient. But I am not sure why
  or if the magic number 0x07C0 always works.

  * Initialize the data segment register DS
  ----------
    mov ax, 07C0h   
    mov ds, ax    ; load DS with correct value 
  ,,,

  * Error!! prints rubbish not 1st and 2nd char of 'message' 
  ------------
   jmp start
   message db 'hello!'
   start:
      mov ah, 0eh
      mov al, [message]    ;! DS hasnt been initialized
      int 10h              ;  will display garbage
      mov al, [message+1]  ;  same...
      int 10h
   hang: jmp hang
   ,,,

  * Correct! print the first two characters of a string
  ------------
   jmp start
   message db 'hello!'
   start:
      mov ax, 07C0h    ; Initialize data segment DS register
      mov ds, ax       ; load DS with correct value 
      mov ah, 0eh      ; bios teletype function
      mov al, [message]   ; first char of 'message' 
      int 10h             ; invoke bios 
      mov al, [message+1] ; 2nd char of 'message'
      int 10h
   hang: jmp hang
   ,,,

  * also works in qemu with no boot signature 
  ------------
     mov ax, 07C0h    ; Initialize data segment DS register
     mov ds, ax       ; load DS with correct value 
     mov ah, 0eh      ; bios teletype function
     mov al, [message]   ; first char of 'message' 
     int 10h             ; invoke bios 
     mov al, [message+1] ; 2nd char of 'message'
     int 10h  

   hang: jmp hang
   message db 'hello!'
   ,,,

EXTENDED READ WRITE FUNCTIONS

  For memory addresses outside of the range use extended
  functions with a dap data structure.

   http://forum.osdev.org/viewtopic.php?f=13&t=27510
     good posts about this topic

   Use INT 13h with AH=42h (read) AH=43h (write), extended functions
   use with a DAP, a datastructure

MOVING DATA

  == data moving instructions
  .. mov - mov data around
  .. xchg - exchange the contents of 2 registers/memory
  ..

POINTERS AND DATA ....

  We can use the BX register as a pointer to data or and index
  Also, SI and DI

  * get data from a pointer
  >> mov bx, [bx]

  This is like 
  int i = *p;   in the C language

MOV ...

  The 'mov' x86 instruction is perhaps the simplest and
  most fundamental instructions

  The syntax for moving a register into memory doesnt seem logical
  to me, ... it is "mov [buffer], ax"

  * when moving to and from memory specify a data size
  >> mov byte al, [foo]

  * when moving to and from memory specify a data size
  >> mov word [buffer], 0x0000 
 
  * mov from register to memory buffer
  --------
    result dw 0x0000   ; assign memory
    sub dx, dx         ; set dx = 0
    add dx, 10
    mov [result], dx   ; store result from register DX
  ,,,

XCHG ...

  This instruction is 1 clock cycle and fewer bytes than
  mov so more desirable in some circumstances.

STRINGS AND TEXT

  A 'string' in this context is just a series of bytes, words
  or double words which exist is contiguous memory locations.
  The bytes may represent characters in some human language, or
  they may not. Its up to you.

  x86 Assembly language has special instructions for dealing with
  strings such as movs, movsb etc. But each instruction only deals
  with one byte, word etc at a time (unless you combine these
  instructions with a 'rep' instruction)

STRING INSTRUCTIONS ....
  
  == summary
  .. cmpsb - compare bytes from 2 strings (in DS:SI and ES:DI)
  .. cmpsw - compare double bytes from 2 strings (DS:SI and ES:DI)
  .. lodsb - load a byte from a string in AL
  .. lodsw - load 2 bytes from a string in to AX
  .. lodsd - load 4 bytes from a string into EAX
  ..


  * load a byte character from a string in AL and update SI
  >> lodsb

STOS ....

  This is the "store a string" instruction and includes 
  stosb, store a byte, stosw, store a word etc. Need to 
  initializefk

  * initialise an array with -1
  -----------
    jmp start
    array resw 100
  start:
    mov ax, cs
    mov es, ax       ; stosw uses es extended segment
    mov ecx, 100
    mov edi, array
    mov ax, -1
    cld      ; clear direction flag, ie go forward not backward
    rep stosw
   here: jmp here
  ,,,

  
  * convert a string to lowercase without changing blank characters
  -----------
    jmp start
  string.a: db 'HeLLo'
  string.b: db '     '
  stringlength: dw 4 
  start:
    mov ax, cs
    mov ds, ax     ; must initialise ds no?
    mov es, ax     ; must initialise es no?
    mov ecx, stringlength 
    mov esi, string.a 
    mov edi, string.b 
    cld      ; clear direction flag, ie go forward not backward
   .again
     lodsb
     or al, 20h
     stosb
     loop .again
   here: jmp here
  ,,,

PRINTING STRINGS ....

  Normally strings are 'printed' or displayed by loading the
  address of the first byte (or word) of a string into
  the SI register and then using 'lodsb' the load string byte
  instruction to get successive characters into the AL or AX
  register while incrementing the pointer in the SI register
  (lodsb does these things automatically).
  
  Another way to print a string is to write the string directly to 
  video memory (instead of using x86 bios int 0x10 functions)
  
  Does LODSB decrement the CX counter? No

  * print the first two characters of a string
  ------------
   jmp start
   message db 'hello!'
   start:
      mov ax, 07C0h      ; Set data segment to where we're loaded
      mov ds, ax
      mov ah, 0eh        ; print character function
      mov al, [message]  ; first char of 'message' 
      int 10h
      mov al, [message+1]
      int 10h
   here: jmp here 
  ,,,

  * print the 1st three characters with lodsb 
  ------------
   jmp start
   message db 'hello!'
   start:
      mov ax, 07C0h      ; Set data segment to where we're loaded
      mov ds, ax    
      cld                ; set dir flag to forwards
      lea si, [message]
      mov cx, 3          ; loop count 3
      mov ah, 0eh        ; print character function
    .again:
      lodsb  ; get next char from message 
      int 10h
      loop .again
   here: jmp here 
   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature
  ,,,

  The code below is working and demonstrates a number of important
  forth style ideas; A set of functions which are set within a 
  data structure (a linked list) with each function having its own
  header with function name and a link to previous function.
  Also, all parameters for the functions are passed on the stack.

  * forth function style printing 
  ------------
   jmp start
   message dw 9, 'AbcXyz{*}'
   ; a linked list of functions (forth style)

   ; -- dup just duplicates the top item on the stack
   dup: dw 0         ; 1st word has a zero link 
        db 3, 'dup'  ; strings are 'counted' 
   dup.x:
      pop bx      ; juggle fn return address
      pop ax      ; get param to duplicate
      push ax
      push ax
      push bx     ; restore fn return address
      ret
   
   ; print takes its arguments on the stack (buffer address, char count)
   print:
      dw dup        ; link to previous dictionary entry 
      db 5, 'print'  
   print.x:
      cld             ; set dir flag to forwards
      pop bx          ; juggle return address for call
      pop cx          ; how many chars to print
      pop ax          ; address of buffer to print
      push bx         ; restore return function call
      mov si, ax      ; maybe should use "lea si, ax" but how?? 
      mov ah, 0eh     ; bios print character function
    .again:
      lodsb  ; get next char from message 
      int 10h
      loop .again
      ret
   start:
      mov ax, 07C0h      ; Set data segment to where we're loaded
      mov ds, ax     
      ;mov sp, ?         ; what about the stack pointer?
      push message+2     ; address of string buffer (1st word is count)
      push 9             ; how many characters to print
      call print.x
   here: jmp here 
   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature
  ,,,

  * print a zero terminated string with address in the SI register 
  -----------------
   BITS 16

   jmp start
   message db 'A function to print',13,10,0
   start:
     mov ax, 07C0h    ; Set up 4K stack space after this bootloader
     add ax, 288      ; (4096 + 512) / 16 bytes per paragraph
     mov ss, ax
     mov sp, 4096
     mov ax, 07C0h    ; Set data segment to where we're loaded
     mov ds, ax

    mov si, message     ; Put string position into SI
    call prints          ; Call our string-printing routine

    hang: jmp hang           ; Jump here - infinite loop!

  ;# prints
  ;   output zero terminated string in SI to screen
  prints:      
    mov ah, 0Eh       ; int 10h 'print char' function

  .again:
    lodsb             ; Get next character from string
    cmp al, 0         ; Char == 0 ? 
    je .done          ; If char is zero, end of string
    int 10h           ; Otherwise, print it
    jmp .again

  .done:
    ret

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
 ,,,

  
VARIABLES OF STRING BUFFERS ....

  Another approach to strings is for a string to push
  the address of its buffer onto the stack. This can then
  be manipulated and displayed by other functions....

  * a string buffer which places its address on the stack 
  -----------------
   BITS 16

   jmp start
   message.s db 'A function to print',13,10,0 
   message: 
     pop bx           ; get return address off stack
     push message.s   ; put the address of the message on the stack
     push bx          ; restore return address
     ret
   puts:
     pop bx    ; the 'puts' procedure return address
     pop si    ; the address of zero ended string to print 
     push bx   ; restore the return address to the stack
     mov ah, 0eH     ; bios teletype
   .again:
     lodsb             ; Get character from string
     cmp al, 0
     je .done          ; If char is zero, end of string
     int 10h           ; Otherwise, print it
     jmp .again
   .done:
     ret

   start:
     mov ax, 07C0h    ; Set data segment to where we're loaded
     mov ds, ax
     call message     ; Put string address onto stack
     call puts        ; print the message on the stack 
     call message     ; Put string address onto stack
     call puts        ; print the message on the stack 

    hang: jmp hang    ; Jump here - an infinite loop

   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature
 ,,,

  


ZERO TERMINATED STRINGS ....

  A zero terminated string simple has a byte with 0 zero in
  it at the end of the characters stored in memory. This is
  the system used by the C language.

COUNTED STRINGS ....

  One method of storing a string is to include the count of 
  the number of characters in a string next to the string where
  it is stored in memory. This system is used in the old 
  'forth' language and in modern languages where strings are 
  stored as objects.

  * store the length (in byte characters) after the string
  --------
    message db 'abcdefghijklmnop'
    count dw $-message
  ,,,

  * print a counted string with lodsb 
  -----------------
   BITS 16
   jmp start
   message db 'Counted String'
   count dw 14 
   start:
     mov ax, 07C0h    ; set the data segment 
     mov ds, ax 
    mov si, message   ; Put string position into SI
    mov cx, [count]   ; how many chars to print
    mov ah, 0Eh       ; int 10h 'print char' function
  .again:
    lodsb           ; Get character from string into AL
    int 10h         ;  
    loop .again     ; loop while CX > 0

    hang: jmp hang           ; Jump here - infinite loop!

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
 ,,,

  The code below is very similar by only uses 1 byte for the 
  count (thus limiting the string length to 255 characters) and
  also has the count preceding the string. These type of 
  counted strings are what are used in the Forth language

  * print a preceding counted string with lodsb 
  -----------------
   BITS 16
   jmp start
   message db 16,'Counted String!!'
   start:
     mov ax, 07C0h    ; set the data segment 
     mov ds, ax 

     cld               ; move forward through message
     sub cx, cx        ; set CX = 0
     mov si, message   ; Put start of string position into SI
     lodsb             ; get [SI] into AL; increment SI
     mov cl, al        ; cl now contains the count
     mov ah, 0Eh       ; bios int 10h 'print char' function
  .again:
     lodsb           ; Get character from string into AL
     int 10h         ; invoke bios 
     loop .again     ; loop while CX > 0

   here: jmp here           ; Jump here - infinite loop!

   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature
 ,,,

  * print a string with an automatic preceding count
  -----------------
   BITS 16
   jmp start
   count dw message.end - message
   message db 'abcdefghijklmnopqrstuvwxyz'
   message.end 

   start:
     mov ax, 07C0h    ; set the data segment 
     mov ds, ax 
     mov si, message   ; Put string position into SI
     mov cx, [count]   ; how many chars to print
     mov ah, 0Eh       ; int 10h 'print char' function
  .again:
    lodsb           ; Get character from string into AL
    int 10h         ;  
    loop .again     ; loop while CX > 0

    hang: jmp hang           ; Jump here - infinite loop!

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
 ,,,


  The code below prints a counted string, where the count
  is calculated by the assembler and is located in the 2 bytes
  just before the string itself in memory. This system allows
  the use of only one label (instead of one for the message and
  one for the count)

  * a counted string with only one label 
  -----------------
   BITS 16
   jmp start
   message dw message.end-$-2
           db 'abcdefghijklmnopqrstuvwxyz'
   message.end 

   start:
     mov ax, 07C0h    ; set the data segment 
     mov ds, ax 

    mov si, message+2   ; Put string position into SI (after count)
    mov cx, [message]   ; how many chars to print (message length)
    mov ah, 0Eh       ; int 10h 'print char' function
  .again:
    lodsb           ; Get character from string into AL
    int 10h         ; x86 bios interrupt, do it! 
    loop .again     ; loop while CX > 0

    hang: jmp hang           ; Jump here - infinite loop!

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
 ,,,


CMPS COMPARE STRING ....

  This includes cmpsb, cmpsw, cmpsw. These instructions 
  compare [ds:si] == [es:di] and set a flag if true or false
  They also advance the 2 pointers by one byte or word etc.
  This means the cmps instructions can be used in a loop or 
  with rep to compare an entire string.

  The instructions can be used with repe repne etc 
  The std, set direction flag instruction and cld, clear direction
  flag determine which way the ds:si and es:di pointers advance
  after the compare instruction

  * initialise ds and es registers and si and di
  ------
    aaa db 'x'
    bbb db 'y'
  start:
    mov ax, 07C0h    ; Set data segment to where we're loaded
    mov ds, ax
    mov es, ax       ; ! must initialise for cmpsb
    lea si, [a]
  ,,,

  We can also initialise ds and es with lds and les

SCAS SCAN STRING ....

  The scan string instructions are used to locate a particular
  'character' (or value) with a string. It uses the ES:DI 
  register pair (not the DS:SI pair).

  This instruction has the variants scasb, scasw, scasd
  
  * scan a string for a particular character
  -----------
   BITS 16

   jmp start
   message db 'abcdefghijklmnop'
   count dw $-message
   start:
     mov ax, 07C0h    ; Set up 4K stack space after this bootloader
     add ax, 288      ; (4096 + 512) / 16 bytes per paragraph
     mov ss, ax
     mov sp, 4096
     mov ax, 07C0h 
     mov ds, ax       ; data segment = code segment 
     mov es, ax       ; extended segment = code segment

     mov di, message
     ; or as below
     ;les di, [message]
     cld              ; search forward (set direction flag = 0) 
     mov al, 'p'      ; the character to scan for 
     mov cx, [count]  ; search within string length 
     repne scasb
     je .found
   .notfound:
     mov al, 'N'
     mov ah, 0eH   ; bios 'teletype' function
     int 10H       ; bios output interrupt 
     jmp hang 
   .found:
     dec di        ; If found, DI points 1 byte further, as with 'cmps'
     mov ax, [di]  ; print the character found
     mov ah, 0eH   ; teletype AL bios function
     int 10H
     mov al, 'Y'
     int 10H

    hang: jmp hang ; loop foever 
    times 510-($-$$) db 0  ; padding
    dw 0AA55h              ; boot signature
   ,,,

  
  The code below should be modified to skip all leading 
  white space. This could be done with 'repe scasb' with al=' '
  
  * find the length of the 1st word of a sentence
  -----------
   BITS 16

   jmp start
     message db 'tree one 2 three'
     count dw $-message
   start:
     mov ax, 07C0h 
     mov ds, ax       ; data segment = code segment 
     mov es, ax       ; extended segment = code segment

     mov di, message  ; or ... les di, [message]
     cld              ; search forward (set direction flag = 0) 
     mov al, ' '      ; scan for next space 
     mov cx, [count]  ; search within string length 
     repne scasb
     je .found
   .notfound:
     mov al, 'N'
     mov ah, 0eH   ; bios 'teletype' function
     int 10H       ; bios output interrupt 
     jmp hang 
   .found:
     dec di            ; DI points 1 byte further, as with 'cmps'
     mov ax, [count]   ; how many characters scanned
     sub ax, cx        ; or do DI - message
     dec ax            ; cx is 1 too small
     add ax, '0'       ; convert count digit to ascii
     mov ah, 0eH   ; teletype AL bios function
     int 10H
     mov al, 'Y'
     int 10H

    hang: jmp hang ; loop foever 
    times 510-($-$$) db 0  ; padding
    dw 0AA55h              ; boot signature
   ,,,


NULL TERMINATED STRINGS ....

  The 'null' or zero terminated string is a series of (usually)
  ascii characters (traditionally bytes) with the last bytes
  being the value zero 0. This is the standard C programming language
  string representation and therefore is pretty common. The 
  advantage is that string manipulation functions dont have to 
  know how long the strings are before doing something with them.

  * define a string with a unix newline
  >> prompt db "ENTER OPERAND:", 13, 0

  * define a string with a dos newline
  >> prompt db "ENTER OPERAND:", 13, 10, 0

  * define a null terminated string
  >> message db 'This is my cool new OS!!!', 0

COMPARING STRINGS ....
  
  Below we use 'dw' for the count of the two words because
  the count is loaded into the CX loop register (and so
  has to be a word, not a byte)
  
  The code below could be better written with 'cmpsb'
  ie "compare string byte". 

  Use 'std' set direction flag to scan through a string backwards

  * eg with cmpsb
  ------
    les edi, string.b    ; loads edi and es segment (??)
    lds esi, string.a    ; loads esi and ds data segment (??)
    mov ecx, stringlength(string.a)
    cld   ; clear direction flag, search forward
    repe cmpsb
    je .same
    ja .above   ; if string.a is greater than string.b
  ,,, 

  Must use 'lea' when initialising si and di registers. Also 
  must initialise es register, since 'cmpsb' compares 
  ds:si with es:di

  * compare with cmpsb 
  ---------
    jmp start
      word.a db 'an elephantaa',0
      length dw $-word.a
      word.b db 'an elephanTaa',0
    start:
      mov ax, 07C0h    ; Set data segment to where we're loaded
      mov ds, ax
      mov es, ax       ; ! must initialise for cmpsb
      mov cx, length   ; this length includes the null termination
      lea si, [word.a]
      lea di, [word.b] 
      cld            ; search forwards (clear direction flag)
      repe cmpsb
      ; below is the more verbose version of repe cmpsb
      ;.again:
      ; cmpsb 
      ; loope .again

      dec si         ; point to last different letter
      dec di
      mov ax, [si]   ; get character into al register
      mov ah, 0eH    ; print al
      int 10H
    here: jmp here
    times 510-($-$$) db 0
    dw 0AA55h
  ,,,


  A forth style compare string function. The function is in a 
  linked list dictionary. Parameters are passed on the stack.
  (addr of 1st counted string, addr of 2nd counted string - returns 0/1/2)

  * compare 2 counted strings, push 0 on stack if equal 
  ---------
    jmp start
    ; below seems better if indirect calls needed, eg call bx, call [bx]
    ; jmp 07C0h:start         ; Goto segment 07C0

      word.a dw 8, 'abcABCaB'
      word.b dw 8, 'abcABCaB'

    ; print T if the top element of stack is 0
    ; print F if top element is <> 0
    truefalse:
      dw 0
      dw 9, 'truefalse'
    truefalse.x:
      pop dx      ; juggle return fn ip
      pop ax      ; get parameter
      push dx     ; restore return fn ip
      cmp ax, 0
      je .true
      mov ah, 0Eh     ; int 10h 'print char' function
      mov al, 'F'
      int 10h         ; x86 bios interrupt, do it! 
      ret
    .true:
      mov ah, 0Eh     ; int 10h 'print char' function
      mov al, 'T'
      int 10h         ; x86 bios interrupt, do it! 
      ret

    compare:
      dw 0
      dw 8, 'compare'
    compare.x:
      pop dx     ; juggle fn return ip
      pop ax     ; 1st buffer address
      pop bx     ; 2nd buffer address
      push dx    ; restore fn ip
      sub cx, cx ; set cx:=0, not necessary here
      mov cx, [bx]    ; how many characters to compare
      add cx, 2       ; we also have to compare the count bytes
      mov si, ax 
      mov di, bx
      cld            ; search forwards (clear direction flag)
      repe cmpsb
      je .same
      ; ja .above    ;
      pop dx         ; juggle return ip
      push 1         ; return result on stack
      push dx
      ret
    .same: 
      pop dx         ; juggle return ip
      push 0         ; return result on stack 
      push dx
      ret

    start:
      mov ax, 07C0h    ; Set data segment to where we're loaded
      mov ds, ax
      mov es, ax       ; ! must initialise for cmpsb
      push word.a
      push word.b
      call compare.x
      call truefalse.x

    here: jmp here
    times 510-($-$$) db 0
    dw 0AA55h
  ,,,


  The following code modifies the 'compare' function to traverse
  a linked list dictionary searching for an entry. If the entry is found 
  the function returns a pointer to the execution token of the word
  on the stack. If it is not found it returns 0 on top of the stack.
  This is a fairly standard forth word, and can be used to implement
  a very simple interactive command interpreter. The function 
  receives on the stack a pointer to a counted string with the name
  to search for and also a pointer to the top of the dictionary
  (last term)

  (stack: searchterm, where to start search -> 0 or *function)

  This is working... now we need to actually execute the found
  word. So we will write one that just prints a star

  !!! Gotcha, we have to use jmp 07C0h:start if we want indirect
  function calls (eg mov bx, fn; call bx) to work. I think this
  is because it sets up the code segment properly...

  * find an entry in a linked list dictionary, forth-style 
  ---------
   ; these dont seem needed in qemu
   BITS 16
   [ORG 0]

    ; !! jmp start, no doesnt work, why?

    jmp 07C0h:start         ; Goto segment 07C0

      searchterm dw 4, 'star'
      last dw find
      buffer dw 0, '                  '

    ; just print a newline
    crlf:
      dw 0
      dw 4, 'crlf'
    crlf.x:
      mov ah, 0eh  ; bios type char function 
      mov al, 13   ; cr lf
      int 10h
      mov al, 10
      int 10h
      ret

    ; just to test
    hash:
      dw crlf,
      dw 4, 'hash'
    hash.x:
      mov ah, 0Eh     ; just print a star with bios
      mov al, '#'
      int 10h         ; x86 bios interrupt
      ret
    ; another fn for testing
    star:
      dw hash,
      dw 4, 'star'
    star.x:
      mov ah, 0Eh     ; just print a star with bios
      mov al, '*'
      int 10h         ; x86 bios interrupt
      ret

    ; get rid of top stack element
    drop:
      dw star,
      dw 4, 'drop'
    drop.x:
      pop ax
      pop bx
      push ax
      ret

    ; just print some message about word found or else '?'
    found:
      dw drop 
      dw 5, 'found'
    found.x:
      pop dx      ; juggle return fn ip
      pop bx      ; get parameter (0 or pointer to function)
      push dx     ; restore return fn ip
      cmp bx, 0
      jne .foundit
      mov ah, 0Eh     ; int 10h 'print char' function
      mov al, '?'
      int 10h         ; x86 bios interrupt, do it! 
      ret
    .foundit:
      mov ah, 0Eh     ; int 10h 'print char' function
      mov al, '!'
      int 10h         ; x86 bios interrupt, do it! 
      add bx, 2       ; point bx to the function header count
      mov cx, [bx]    ; get the function name count 
      mov al, cl      ; print the count (for feedback)
      add al, '0'     ; convert to ascii
      int 10h         ; x86 bios interrupt, do it! 
      mov al, [bx+2]  ; first letter of func name
      int 10h         ; x86 bios interrupt, do it! 
      ret

    ; should rewrite this, use si for current word point and 
    ; di for search term pointer. Use byte word counts, not 2 bytes
    ; 
    find:
      dw found 
      dw 4, 'find'
    find.x:
      pop dx     ; juggle fn return ip
      ; no, just pop into si and di, easier
      pop bx     ; where to start searching (eg last entry in dict)
      pop ax     ; counted string buffer to search for 
      push dx    ; restore fn ip
    .again:
      sub cx, cx ; set cx:=0, not necessary here
      mov si, bx      ; pointer to current function header
      add si, 2       ; the counted string is 2 bytes after header
      mov cx, [si]    ; the count of the search term
      add cx, 2       ; we also have to compare the count bytes
      mov di, ax      ; the search term pointer
      cld            ; search forwards (clear direction flag)
      repe cmpsb     ; compare all characters for equality
      je .found
      mov bx, [bx]    ; get the pointer to the next function (or 0)
      cmp bx, 0       ; if start of dict, then link is 0
      je .notfound    ; no more to words search, so exit
      push ax         ; save ax, the search term pointer
      mov ah, 0Eh     ; print a dot on each unsuccessful search
      mov al, '.'     ; for debugging
      ;int 10h        ; x86 bios interrupt
      pop ax          ; restore the search term pointer      
      jmp .again 
    .notfound:
      pop dx
      push 0         ; not found so return 0
      push dx
      ret
    .found: 
      pop dx         ; juggle return ip
      push bx        ; return result on stack 
      push dx
      ret
    
    ; execute a function given a pointer to its header on the stack
    exec:
      dw find
      dw 4, 'exec'
    exec.x:
      pop ax
      pop bx     ; get pointer to function
      push ax    ; preserve fn return pointer
      add bx, 2  ; point to name count
      mov cx, [bx]  ; get the count
      add bx, 2     ; skip over count
      add bx, cx    ; advance the pointer to the function

      ; !! not call [bx] thats a pointer to jumptable
      ; !!! call bx may change the stack (probably will) so we need 
      ; !!! to preserve the call return ip 
      pop word [returnexec]      ; save return ip
      call bx       ; call the fn pointed to by bx
      push word [returnexec]     ; restore fn return ip
      ret
    ; a dodgy solution, but any register my
    returnexec dw 0


   ; get a line of input from the user 
   ; stack parameters: buffer address, max characters
   ; returns: buffer address. This would be called 'accept' in 
   ; a traditional forth system. We can leave the buffer address
   ; on the stack, which can be used by the calling proceedure.
   line:  
     dw exec       ; link to next fn in dictionary
     dw 4, 'line'  ; forth-style function header 
   line.x:
     pop bx       ; juggle return pointer
     pop cx       ; how many chars maximum to get 
     ; use pop di instead of pop ax, since stosb comes later
     pop ax       ; where to copy chars
     push bx      ; restore return pointer
     mov bx, ax   ; save buffer address, unnecessary, leave on stack
     add ax, 2    ; first word of buffer is char count
     mov di, ax   ; where stosb will put characters 
                  ; could use 'lea di, [bx]' as well
     sub dx, dx   ; simple char counter set dx:=0
   .again:
     mov ah, 0    ; wait for keypress bios function
     int 16h
     cmp al, 13   ; was the key press an 'enter' 
     je .exit     ; exit if enter pressed
     mov ah, 0eh  ; echo the character
     int 10h
     stosb        ; put the char into the buffer
     inc dx       ; increment char counter
     loop .again
   .exit:
     ; pop di, then mov [di], dx
     mov [bx], dx ; store char count in buffer
     pop ax       ; preserve fn return ip
     push bx      ; return buffer address on stack
                  ; Not Necessary !!! just leave it there
     push ax
     ret

   ; just the forth count word
   count:
     dw line
     db 5, 'count'
   count.h:
     pop dx         ; preserve return fn pointer
     pop bx         ; buffer address
     push dx        ; leave buffer addr on stack
     xor ax, ax     ; ax := 0
     mov al, [bx]   ; get count into al
     inc bx
     pop dx          ; juggle return ip
     push bx         ; new buffer address
     push ax         ; char count
     push dx         ; fn return ip
     ret

   ; type takes its arguments on the stack (buffer address, char count)
   ; modify this to display char count and chars. So it gets
   ; a pointer to a counted string and displays both
   type:
      dw line        ; link to previous dictionary entry 
      dw 5, 'type'  
   type.x:
      cld             ; make lodsb step forward through chars
      pop bx          ; juggle return address for call
      pop cx          ; how many chars to print
      pop si          ; address of buffer to print
      push bx         ; restore return function call
      mov ah, 0eh     ; bios print character function
    .again:
      lodsb           ; get next char from message in al
      int 10h         ; print char with bios
      loop .again
      ret
      
    start:
      mov ax, cs    ; initialize the data segment register DS
      mov ds, ax
      mov es, ax       ; ! must initialise for cmpsb
      ;push searchterm  ; what word to search for

     .again:
      push buffer       ; where to store input from 'line'
      push 10           ; max number of chars
      call line.x
      ;push buffer+2
      ;push 6
      ;call type.x      ; change how this type works (counted string)
      push buffer       ; the search term
      push type         ; top of dictionary
      call find.x       ; find the name entered by user
      ;call found.x
      call crlf.x
      pop ax
      push ax     
      cmp ax, 0         ; if not found top of stack is 0
      je .again         ; dont execute if zero, put this in exec
      call exec.x       ; execute it!
      call crlf.x
      jmp .again

    here: jmp here
    times 510-($-$$) db 0
    dw 0AA55h
  ,,,


  The code below is unnecessarily long. It should use cmpsb etc

  * compare 2 counted strings for equality
  ----------
   BITS 16
   jmp start
   word.a dw 18 
          db 'five hundred and 2'
   word.b dw 18 
          db 'five hundred and 1'

   start:
     mov ax, 07C0h    ; set the data segment 
     mov ds, ax 

    mov ah, 0Eh       ; int 10h 'print char' function
    mov si, word.a    ; Put address of 1st byte of word.a into SI 
    mov di, word.b    ; same, but for word.b in to DI 
    mov ax, [si]      ; get word.a count into AL 
    mov bx, [di]
    cmp ax, bx        ; see if 2 words have the same count
    jne different    ; print message and terminate
    mov cx, ax       ; put the loop count into cx (ch == 0)
    inc si           ; point to first char of word.a
    inc di           ; point to first char of word.b
  .again:
    lodsb            ; Get character from word.a into AL
    mov bl, [di]     ; get next char from word.b into BL
    inc di
    cmp al, bl       ; see if the character is the same 
    jne different    ; print message and terminate
    loop .again      ; loop while CX > 0

  same:             ; the words must be the same
    mov ah, 0Eh     ; int 10h 'print char' function
    mov al, 'S'
    int 10h         ; x86 bios interrupt, do it! 

  hang: jmp hang           ; Jump here - infinite loop!

  different:
    mov ah, 0Eh     ; int 10h 'print char' function
    mov al, 'D'
    int 10h         ; x86 bios interrupt, do it! 
    jmp hang

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
 ,,,

  Todo!
   * enter a string and check if it is in a dictionary
   -------------
     receive characters, count them, store them in a buffer
     then compare them to words in a dictionary 
   ,,,

COPYING STRINGS ....

  what is the difference between 'lea' and 'lds', mov etc?

  * copy a number of bytes from one destination to another
  -----------------
    mov cx,(number of bytes to move)
    lea di,(destination address)
    lea si,(source address)
    cld   ; clear direction flag, copy forwards
    rep movsb
  ,,,

  * a complete copy example
  ---------
    jmp start
      word db 'an elephant',0
      length EQU $-word
      buffer resb 80
    start:
      mov ax, 07C0h           ; Set data segment to where we're loaded
      mov ds, ax
      mov cx, length   ; this length includes the null termination
      mov si, word
      mov di, buffer
      cld         ; copy forwards (clear direction flag)
      rep movsb
    here: jmp here
    times 510-($-$$) db 0
    dw 0AA55h
  ,,,

CONVERTING TO AND FROM STRINGS ....

  * convert from a digit to an ascii character by adding '0' or 48
  ------
   mov al, 9 
   add al, '0' 
  ,,,

  * convert a digit to hexadecimal using xlatb
  -----------
    ; Tell the compiler that this is offset 0.
    ; It isn't offset 0, but it will be after the jump.
    [ORG 0]
            jmp 07C0h:start         ; Goto segment 07C0

    hextable db "0123456789ABCDEF"
    start:
      ; Update the segment registers
      mov ax, cs
      mov ds, ax
      mov es, ax
      mov al, 15
      mov ebx, hextable   ; translation table
      xlatb               ; replace al with hex digit
      mov ah, 0eH         ; print al
      int 10H
    hang: jmp hang
    times 510-($-$$) db 0
    dw 0AA55h
  ,,,

  The code below may be the most concise possible way to 
  print a number in assembly language.

  * print a 2 byte number in hexadecimal 
  -----------
    ; Tell the compiler that this is offset 0.
    ; It isn't offset 0, but it will be after the jump.
    [ORG 0]
       jmp 07C0h:start         ; Goto segment 07C0
    hextable db "0123456789ABCDEF"
    start:
      mov ax, cs     ; cs is 07C0 after the far jump
      mov ds, ax     ; point data segment -> code segment
      mov ah, 0x0E   ; bios teletype function 
      mov bx, hextable   ; translation table
      mov dx, 0xFEDC     ; the number to print
      mov cx, 4
      .again
        rol dx, 4
        mov al, dl
        and al, 0x0F
        xlatb              ; replace al with hex digit
        int 10H
        loop .again
    hang: jmp hang
    times 510-($-$$) db 0
    dw 0AA55h
  ,,,

  The code below is just a variation on the code above
  where the stack is used to pass the number to the function
  or proceedure. Hopefully this will allow us to reuse this
  proceedure in other code.

  We have to juggle the stack to get the parameter off without
  hurting the return address

  * print a 2 byte number but use the stack to pass number 
  -----------
    [ORG 0]
       jmp 07C0h:start     ; Goto code segment 07C0
    start:
      mov ax, cs   ; cs is 07C0 after the far jump
      mov ds, ax   ; point data segment -> code segment

      ; doesnt seem necessary to initialize the stack
      ;add ax, 288  ; (4096 + 512) / 16 bytes per paragraph
      ;mov ss, ax   ; initialise stack pointers
      ;mov sp, 4096 

      push 0xABCD   ; the number to print
      call hexprint       ; print the number 
      here: jmp here      ; and the rest is silence

    hexprint.data:
      hextable db "0123456789ABCDEF"
    hexprint:
      pop bx   ; get off the return call address
      pop dx   ; retrieve the number to print
      push bx  ; save the return call address
      mov ah, 0x0E      ; x86 bios print char function
      mov bx, hextable   ; translation table
      mov cx, 4
    .again:
        rol dx, 4
        mov al, dl
        and al, 0x0F
        xlatb              ; replace al with hex digit
        int 10H
        loop .again
      ret
    times 510-($-$$) db 0  ; pad to 512 bytes total
    dw 0AA55h              ; standard x86 bootloader signature
  ,,,

  conversion to decimal display... divide by base (10) 
  convert remainder to ascii using xlatb. push to stack. 
  divide again by base, convert to ascii ... and so on
  until quotient is 0. Then pop the stack and display each 
  character.

  The code below seems to be working. To adapt for 16 bit 
  unsigned ints we need to use DX AX as dividend and BX 
  as base or divisor

  * convert an unsigned 8 bit number to ascii in any base 0 to 16
  -----------
    [ORG 0]
      jmp 07C0h:start         ; Goto segment 07C0

    number db 255 
    base db 16 

    start:
      ; Update the segment registers
      mov ax, cs
      mov ds, ax
      mov es, ax
      ;mov ax, 07C0h     ; Set up 4K stack space after this bootloader
      add ax, 288       ; (4096 + 512) / 16 bytes per paragraph
      mov ss, ax
      mov sp, 4096
       
    mov al, [number]  ; number/base
    mov bl, [base]    ; bl is the divisor
    call printi8

    hang: jmp hang
    
  ;proc printi8 
  ; expects the 8 bit number to display in AL and 
  ; the base in BL register
  hextable db "0123456789ABCDEF"
  printi8:
    push cx
    push bx
    push ax
    sub cx, cx        ; set counter = 0
    .again:
      sub ah, ah          ; ah = 0, ax is the dividend
      div bl              ; does ax/bl. remainder --> ah
      push ax             ; save remainder:quotient on the stack 
      inc cx              ; increment the digit counter
      cmp al, 0           ; if the quotient != 0 do the next digit 
      jne .again          ; loop while quotient > 0

    .print:
      pop ax              ; get digit from the stack
      mov al, ah          ; convert digit to ascii
      mov ebx, hextable   ; translation table
      xlatb               ; replace al with hex digit from table
      mov ah, 0eH         ; print digit in al
      int 10H
      loop .print         ; using cx the digit counter to loop 
     pop ax
     pop bx
     pop cx
   ret

    times 510-($-$$) db 0
    dw 0AA55h
  ,,,

LOGICAL AND BIT OPERATIONS

  and, or, not, xor, test

  Use the OR instruction to turn on 1 or more bits of a register
  Use the AND instruction to turn off 1 or more bits of a register 

OR INSTRUCTION ....

  * turn on the high bit of the bl register
  -----------
    mov bl, color
    or bl, 10000000b
  ,,,

  * cut and paste bits with 'or'
  -------
    and AL, 55H  ; cut odd bits
    and BL, 0AAH ; cut even bits
    or AL, BL    ; paste the registers together
  ,,,

XOR INSTRUCTION ....

  Toggles one or more bits. etc

  * initialize the AX register to zero
  >> xor AX, AX

  * toggle the last bit of the AX register
  >> xor AX, 1

  In the code the value 0A6h is the encryption 'key' (any 
  key may be used, or chosen by the user). The data can be
  unencrypted using the same function.

  * encrypt data with xor
  ---
  input db 'unencrypted'
  output db '           '
    ...
    cld    
    lea si, [input]
    lea di, [output]
    lodsb   ; read a data byte (or character) into AL
    xor AL, 0A6H
    stosb   ; write data byte from AL to output buffer
  ,,,

AND INSTRUCTION ....

TEST INSTRUCTION ....

  The test instruction can be used to test the value of 
  one or more bits of a register. This is similar to the AND
  instruction but the register value is not changed.

  The TEST instruction sets the Zero flag if the test is true.
  So we can use jz, je, loopz, loope for the true case
  and jnz, jne, loopnz, and loopne for the false case.

  TEST sets the flag register is an identical way to the 
  AND instruction. So, if the result of the AND instruction
  would be zero, then TEST will set the zero flag to 1.
  This can be a bit confusing!! Use jz or jnz with test

  * check if AL == 0
  >> test al, al 

  * jump if AL is odd 
  -------
    test AL, 1
    jz .odd
  ,,,

  * jump if the least OR most significant bits of AX are set
  -------
    test AL, 10000001b
    jz .exit
  ,,,

   We can only test if one of several bits are set, not
   if they are all set.

   * test if the most significant bit of bl is set 
   -----------
   jmp start
   start:
     mov bl, 0b10101011   ; pattern to display
     mov dl, 0b00000000   ; variable test pattern
     test bl, 0b10000000
     jz .unset 
   .set:
     mov ah, 0eH   ; bios teletype function
     mov al, 'y'  
     int 10H         ; do it
     jmp end
   .unset:
     mov ah, 0eH   ; bios teletype function
     mov al, 'n'  
     int 10H         ; do it
   end: jmp end
   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature
  ,,,

  We can use the TEST instruction to see if a number is
  divisible by a power of 2 (eg 2,4,8,16...). 
  Use 2^n - 1 as the operand to TEST (1,3,7,15...) .

  No, I think this is wrong !!!

  * print a "!" if CX is divisible by 4 
  ---------------------------------
  start:
  mov cx, 9 
  .again:
    mov al, cl 
    add al, '0'   ; convert digit to ascii
    mov ah, 0eH   ; bios teletype function
    int 10H       ; invoke bios
    test cl, 3 
    jnz .here
    mov al, '!' 
    mov ah, 0eH   ; bios teletype function
    int 10H
  .here:
    loop .again
    jmp $             ; keep looping! 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

  * print a "!" if CX is divisible by 8
  ---------------------------------
  start:
  mov cx, 31 
  .again:
    mov al, 'o'   ; the character to print
    mov ah, 0eH   ; bios teletype function
    int 10H       ; invoke bios
    test cx, 0x0007 
    jne .here
    mov al, '!' 
    mov ah, 0eH   ; bios teletype function
    int 10H
  .here
   loop .again
    jmp $             ; an infinite loop
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

XOR INSTRUCTION ....

NOT INSTRUCTION ....

  * reverse every bit of a register
  >> not ESI

SHIFT INSTRUCTIONS ....

  Shift operations are useful for multiplying and (integer)
  dividing by powers of 2. eg 2, 4, 8, 16 etc  This should
  be faster than using the MUL and DIV instructions

  The code below is more easily done with the ROR or ROL
  instructions.
  
  * encrypt data by swapping nibbles with shl and shr
  -----------
    ; al contains the byte to be encrypted
    mov AH, AL
    shl AL, 4   
    shr AH, 4
    or AL, AH
    ; al has encrypted byte
  ,,,

SHL ....

  shift left
  
  what does shift fill the empty bits with 0, 1 or indeterminate

SHR ....

  shift right

ROTATE INSTRUCTIONS ....

  == rotate sumary
  rol - rotate left
  ror - rotate right
  rcl - rotate left through carry
  rcr - rotate right through carry.
  ,,,

  * encrypt a byte by swapping nibbles (4 bits in a byte)
  ----
    mov CL, 4
    ror AL, CL   ; or rol AL, Cl (no difference)
  ,,,

DISPLAYING BIT PATTERNS ....

  
   strangely the following 3 lines are not 
   equivalent
   -------
     ;test bl, dl     ; see if bit is set
     and bl, dl     ; see if bit is set
     cmp bl, dl
   ,,,

   * display a bit pattern
   -----------
   block equ 0xFE   ; ascii code for small block
   alpha equ 224    ; Greek letter alpha 
   jmp start
   start:
     mov ax, 07C0h  ; Initialize data segment DS register
     mov ds, ax     ; load DS with correct value 
     mov cx, 8      ; number of bits to display 
     mov dl, 0b10000000   ; test pattern
     mov bl, 0b10101010   ; pattern to display
   .again:
     mov ah, 0eH     ; bios teletype
     test bl, dl     ; see if bit is set
     jnz .fill
     mov al, '0'     ; print zero
     int 10H         ; do it
     jmp .ll
   .fill:
     mov al, block   ; char to print 
     int 10H         ; do it
   .ll:
     ror dl, 1       ; move the bit pattern
     loop .again     ; go again
   here: jmp here        
   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature

   ,,,

PROCEEDURES

  Procedures are similar to jumps in that the IP instruction pointer
  is modified implicitly by the instruction. The 'call' and 'ret'
  instructions implement proceedures in x86 assembler. 

INDIRECT PROCEEDURES ....

  There are some gotchas with this syntax, read carefully!

  * simple indirect function call 
  ----
   jmp start
   hash:
     ; print a hash
     ret
   start:
     mov bx, start
     call bx

  ,,,

  * a pointer to a pointer 
  ----
   jmp start
   jumptable dw hash, bang
   hash:
     ; print a hash
     ret
   bang:
     ; print a bang
     ret
   start:
     mov bx, jumptable+2 
     call [bx]
     ; this calls the 'bang' function !!!
  ,,,

  * call proceedure located by a pointer 
  >> call [BX]

  Gotchas!!! If the code segment is not initialised
    call [bx] doesnt work!!!
  So do: jmp 07C0h:start         ; Goto segment 07C0
  Or do: mov ax, 07c0h and mov cs, ax etc

  The following is a simple command interpreter. The next 
  step is to create a linked list dictionary with a function
  which searches through and executes. Another step is to 
  have some kind of self-referentialism, that is, so that the 
  user can look up what functions are available. 

  This self referentialism can be provided with a 'command name'
  which is just a counted string before the code to be executed.

  * a simple indirect procedure call with jump-table
  ---------
   BITS 16
   [ORG 0]
    alpha equ 224   ; Greek letter alpha 
    beta  equ 225   ; Greek letter beta
    gamma equ 226   ; Greek letter gamma

    jmp 07C0h:start         ; Goto segment 07C0
    jumptable dw aa,bb,cc
    aa:
      mov al, alpha   ; letter to print 
      mov ah, 0eH     ; bios teletype
      int 10H         ; do it
      ret
    bb:
      mov al, beta 
      mov ah, 0eH     ; bios teletype
      int 10H         ; invoke bios 
      ret             ; return from 'call'
    cc:
      mov al, gamma 
      mov ah, 0eH   
      int 10H
      ret
    start:
      mov ax, cs    ; initialize the data segment register DS
      mov ds, ax
    .again:
      mov al, '>'   ; print a prompt
      mov ah, 0eH   ; bios teletype 
      int 10H
      mov ah, 0     ; bios wait for keypress function
      int 16h       ; invoke bios
      cmp al, 'a'   ; check for valid command (a-c)
      jb .again     ; just print prompt if invalid
      cmp al, 'c'
      ja .again
      sub al, 'a'   ; convert letter to a index into jump table
      sub bx, bx    ; set bx:=0
      mov bl, al    ; ax cant be used in effective addresses
      shl bl, 1     ; do bl:=bl*2 (since its a double-byte array)
      
      ; this below also would work
      ;mov bx, jumptable+2
      ;call [bx]
      ; and this too, its a gotcha
      ;mov bx, bb 
      ;call bx

      call [jumptable+bx]   ; jump-table is word (2 byte) cells
      
      jmp .again            ; loop forever 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

PARAMETERS FOR PROCEDURES ....

  We can pass parameters to procedures in x86 assembly in various ways.
  One is on the stack, another is in registers, or even in memory.
  If we pass parameters on the stack then we can 'juggle' the stack,
  since inside the procedure, the top item is the return address for 
  the procedure. Or we can use the stack pointer SP register to 
  get the value

  * basic stack juggling to pass a parameter
  -------
    mov ax, 0xABCD
    push ax
    call puts
    puts:
      pop bx    ; the 'puts' procedure return address
      pop ax    ; the parameter we want to use
      push bx   ; restore the return address to the stack
      ; ... code to do something with parameter
      ret
  ,,,

  * return a parameter via the stack 
  -------
    call puts
    pop ax    ; the parameter that was returned
    ...
    puts:
      ; ... code 
      pop bx    ; the return address 
      push ax   ; the parameter we want to return
      push bx   ; restore the return address to the stack
      ret
  ,,,

  We can modify the code below to include a header in the procedure
  eg
    puts:
      db 5, 'print'
    puts.x:
      ... code

  This allows us to use the procedure interactively as well
  as programatically, for example, we can look up the function

  * print string, using stack to pass address of buffer 
  ---------
   BITS 16
   [ORG 0]
    cr equ 13   ;  carriage return
    lf equ 13   ;  carriage return

    jmp 07C0h:start         ; Goto segment 07C0
    buffer db 'String in Buffer',13,10,0
    puts:
      pop bx    ; the 'puts' procedure return address
      pop si    ; the address of zero ended string to print 
      push bx   ; restore the return address to the stack
      mov ah, 0eH     ; bios teletype
    .again:
      lodsb             ; Get character from string
      cmp al, 0
      je .done          ; If char is zero, end of string
      int 10h           ; Otherwise, print it
      jmp .again
    .done
      ret

    start:
      mov ax, cs    ; initialize the data segment register DS
      mov ds, ax
    .again:
                      ; "push buffer" also seems valid
      mov ax, buffer  ; address of string
      push ax         ; pass address to function
      call puts       ; jump-table is word (2 byte) cells
      mov ax, buffer  ; address of string
      push ax         ; pass address to function
      call puts       ; jump-table is word (2 byte) cells
  here:    jmp here         ; loop forever 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

  Arguments below are passed and returned by juggling the stack.
  Command 'c' waits for a keypress and puts it value on the stack.
  Command 'a' prints the value of the key on the stack. Command 'b'
  just prints a greek beta

  * an example of stack passing and returning arguments/parameters
  ---------
   BITS 16
   [ORG 0]
    cr equ 13   ;  carriage return
    lf equ 13   ;  carriage return
    alpha  equ 224   ; Greek letter alpha
    beta   equ 225   ; Greek letter beta

    jmp 07C0h:start         ; Goto segment 07C0
    jumptable dw putc,bb,cc
    putc:
      pop bx    ; the 'putc' procedure return address
      pop ax    ; the parameter we want to use
      push bx   ; restore the return address to the stack
      mov ah, 0eH     ; bios teletype
      int 10H         ; do it
      ret
    bb:
      mov al, beta
      mov ah, 0eH     ; bios teletype
      int 10H         ; invoke bios 
      ret             ; return from 'call'
    cc:
      mov ah, 0     ; bios wait for keypress function
      int 16h       ; invoke bios
      pop bx        ; get return address off stack
      push ax       ; put char in AL onto stack
      push bx       ; restore return address
      ret           ; return from 'call'
    start:
      mov ax, cs    ; initialize the data segment register DS
      mov ds, ax
    .again:
      mov al, '>'   ; print a prompt
      mov ah, 0eH   ; bios teletype 
      int 10H
      mov ah, 0     ; bios wait for keypress function
      int 16h       ; invoke bios
      mov ah, 0eH   ; echo char just typed (command name)
      int 10H
      cmp al, 'a'   ; check for valid command (a-c)
      jb .again     ; just print prompt if invalid
      cmp al, 'c'
      ja .again
      sub al, 'a'   ; convert letter to a index into jump table
      sub bx, bx    ; set bx:=0
      mov bl, al    ; ax cant be used in effective addresses
      shl bl, 1     ; do bl:=bl*2 (since its a double-byte array)
      call [jumptable+bx]   ; jump-table is word (2 byte) cells
      jmp .again            ; loop forever 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

JUMPS 

  direct, indirect, short long etc, conditional, unconditional

CONDITIONAL JUMPS ....

  jne/jnz, ja/jnbe, je/jz, jae/jnb, jb/jnae, jbe/jna
  je jz - jump if ZF=1

  * signed jumps
  ----
    jg, jge, jl, jle are signed jumps
    js, jump if signed
    jns jump if not signed
  ,,,

  * example with lots of jumps
  -------
    mov AX, 10
    mov BX, 9
    cmp AX, BX
    je .equal     ; if ax=bx jump 
    jz. equal     ; same
    jne .unequal  ; if ax!=bx jump
    jnz .unequal  ; same
    ja .above     ; ?if ax>bx jump
    jae .greaterequal  ; ? if ax=>bx jump
    jnb ...            ; same

  ,,,

  == summary of jump instructions
  .. jecxz - jump if ecx is 0
  .. jc - jump if carry
  .. jnc - jump if no carry
  .. jo - jump if overflow
  .. jno - jump if no overflow
  .. js - jump if negative sign
  .. jns - jump if not negative sign
  .. jp - jump if parity
  .. jpe - jump if even parity
  .. jnp - jump if no parity
  .. jpo - jump if odd parity
  ,,,

INDIRECT JUMPS ....

  Indirect jumps may be used to simulate 'switch' or 'case'
  language syntax from higher level languages.

  See also indirect call statements. The techniques of indirect
  jumps and calls all a very simple command interpreter to 
  be written (in the style of a forth system).

  Code below is hanging why??? some code is now working
  eg jmp di

  Many hours of frustration later, it seems the problem was two-fold.
  Firstly register and register indirect jumps us the CS code segment
  implicitly (the offset is calculated from the start of the CS
  segment). The technique used in the MikeOs primer doesnt seem
  to set the code segment properly... Also there are 2 forms ...

  * jump to a memory location contained in register di
  >> jmp di

  * and jump to location specified by register pointer
  >> jmp [di]

  The version above can be used with jump-tables for example
  
  * perhaps the simplest register jump
  ---------
   BITS 16
   [ORG 0]
    jmp 07C0h:start  ; Go to (code?) segment 07C0
    nip:
      mov al, '!'    ; print something
      mov ah, 0eH   
      int 10H
      jmp $ 
    start:
      mov ax, cs
      mov ds, ax   ; may not be necessary?
    .again:
      mov bx, nip 
      jmp bx 
   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature
  ,,,

  The code below is a good template for possibly the simplest 
  possible command interpreter possible within an assembly
  program. The program prompts the user for a letter (command)
  and then executes some code based on the letter entered. 
  The code is selected with a jump-table and an indirect
  jump. It may be more sensible to use procedure 'calls' in
  this case rather than jumps.

  * a simple register-indirect jump with jump-table
  ---------
   BITS 16
   [ORG 0]
    jmp 07C0h:start         ; Goto segment 07C0
    jumptable dw aa,bb,cc
    aa:
      mov al, 'A'   ; print A
      mov ah, 0eH   
      int 10H
      jmp start.again 
    bb:
      mov al, 'B'   ; print B
      mov ah, 0eH   
      int 10H
      jmp start.again 
    cc:
      mov al, 'C'   ; print C
      mov ah, 0eH   
      int 10H
      jmp start.again 
    start:
      mov ax, cs
      mov ds, ax
      mov es, ax
    .again:
      mov al, '?'   ; print a prompt
      mov ah, 0eH   
      int 10H
      mov ah, 0     ; wait for keypress function
      int 16h
      cmp al, 'a'
      jb .again 
      cmp al, 'c'
      ja .again
      sub al, 'a'   ; convert letter to a digit
      sub bx, bx    ; set bx:=0
      mov bl, al    ; ax cant be used in effective addresses
      shl bl, 1     ; do bl:=bl*2
      jmp [jumptable+bx]   ; jump-table is word (2 byte) cells
      jmp .again           ; loop forever 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

  Add a code name to each function, this provides some
  self referentialism.

  * a simple register-indirect jump with jump-table
  ---------
   BITS 16
   [ORG 0]
    jmp 07C0h:start         ; Goto segment 07C0
    jumptable dw aa,bb,cc
    aa:
      mov al, 'A'   ; print A
      mov ah, 0eH   
      int 10H
      jmp start.again 
    bb:
      mov al, 'B'   ; print B
      mov ah, 0eH   
      int 10H
      jmp start.again 
    cc:
      mov al, 'C'   ; print C
      mov ah, 0eH   
      int 10H
      jmp start.again 
    start:
      mov ax, cs
      mov ds, ax
      mov es, ax
    .again:
      mov al, '?'   ; print a prompt
      mov ah, 0eH   
      int 10H
      mov ah, 0     ; wait for keypress function
      int 16h
      cmp al, 'a'
      jb .again 
      cmp al, 'c'
      ja .again
      sub al, 'a'   ; convert letter to a digit
      sub bx, bx    ; set bx:=0
      mov bl, al    ; ax cant be used in effective addresses
      shl bl, 1     ; do bl:=bl*2
      jmp [jumptable+bx]   ; jump-table is word (2 byte) cells
      jmp .again           ; loop forever 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

  In the examples below we jump to the third item in
  a word cell jump table

  * jumptable examples, 
  ---------
     mov di, [jumptable+4] 
     call di
     ; is equivalent to
     mov di, jumptable+4 
     call [di]
     ; which is equivalent to
     call [jumptable+4]
   ,,,

  code below is not working because of code segment issues
  but we can do jmp [table+esi*4] which is good!

  * indirect jump example
  -----
  [org 0]
    jmp start
    jumptable dd apple
              dd orange
              dd pear
              dd lemon
    start:
      mov ax, 07C0h    ; Set data segment to where we're loaded
      mov cs, ax
      mov ds, ax
      ; get a digit (0-3) into AX
      sub eax, eax  ; set eax := 0
      mov ah, 0     ; wait for keypress function
      int 16h
      mov ah, 0eH   ; echo the keypress
      int 10H
      cmp al, '0'   ; check if digit is 0,1,2 or 3
      jb start      ;
      cmp al, '3'   ;  
      ja start      ;
      sub ah, ah    ; set ah := 0
      sub al, '0'   ; convert from ascii to a digit 0-3
      mov esi, eax
      jmp [jumptable+ESI*4]   ; indirect jump
    apple:
      mov al, 'A'   ; print A
      mov ah, 0eH   
      int 10H
      jmp start
    orange:
      mov al, 'B'   ; print B
      mov ah, 0eH   
      int 10H
      jmp start
    pear:
      jmp start
    lemon:
      jmp start
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

LOOPS

  * print digits 0-9 in ascending order 
  ---------------------------------
  start:
  mov cx, 10 
  .again:
    mov al, 10 
    sub al, cl    ;
    add al, '0'   ; convert digit to ascii
    mov ah, 0eH   ; bios teletype function
    int 10H       ; invoke bios
    loop .again
    jmp $             ; keep looping! 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

  * attempt to read from a disk 3 times into a data buffer
  -----------------

    mov cx, 3      ; countdown of read attempts
  read_loop:
    xor ah, ah     ; set ah to zero - reset drive function
    int 0x13       ; call drive reset 

    mov ax, ds
    mov es, ax          ; es == ds
    mov bx, BlahBlah ; set BX to the address (not the value) of BlahBlah 
    mov dl, DriveNumber
    mov dh, HeadNumber
    mov al, NumSectors 
    mov ch, CylNumLow
    mov cl, CylNumHigh ; set the high part of the cylinder number, bits 6 and 7 
    and cl, Sector ; set the sector number, bits 0-5
    mov ah, 0x2 ; set function 2h

    int 0x13        ; call the interrupt
    jnc exit        ; if the carry flag is clear, it worked
    loop read_data  ; try three times, then give up - leave error msg in al

     exit:
     ;;; whatever other code you need

     [segment data]
     BlahBlah resb 512 
  ,,,

  The CX register with the loop command, counts down. So we 
  need some extra logic to make it count up

  * print extended ascii characters in ascending order
  ---------------------------------------------------------
    start:
    mov cx, 0x00FF 
    .again:
      mov al, 0xFF
      sub al, cl     ; the character to print goes in AL
      mov ah, 0eH    ; bios teletype function
      int 10H        ; invoke bios function
      loop .again
      jmp $          ; keep looping! 
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  * print all ascii characters in descending order
  ---------------------------------------------------------
  start:
  mov cx, 0x00FF 
  .again:
    mov al, cl 
    mov ah, 0eH
    int 10H
    loop .again
    jmp $             ; keep looping! 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

LOOP INSTRUCTIONS ....
  
  loop, loopz, loope, loopnz, loopne

GOTCHAS FOR LOOPS ....

  If the CX register somehow goes < 0 then the loop will
  probably continue forever (since -1 == 0xFFFF) !! or at least
  a long time. Long enough to cause havoc

REP INSTRUCTIONS ....

  The rep instruction is similar to the LOOP instructions except
  that only one instruction is repeated (multiple in the case
  of loop).

  The REP instrucion and its cousins is used in conjunction 
  with another instruction which is repeated while CX is not
  0 (CX is decremented on each repetition of the instruction)

  * rep instructions
  >> rep, repe, repz, repne, repnz

  * increment AX 4 times
  -------
    mov CX, 4
    rep inc AX
  ,,,

  * move 8 bytes of data from DS:SI to ES:DI
  -------
    mov CX, 8
    rep movsb
  ,,,

  * move 8 double words (32bits) of data from DS:SI to ES:DI
  -------
    mov CX, 8
    rep movsd
  ,,,

STACK 

  The stack is a useful thing but you may have to allocate space
  for it. It grows 'down', towards low memory, and toward SS.
  The stack contains either word (16 bit) or double word (32 bit)
  data items (but never 8 bit). When the stack is full then
  SS == SP. Normally we put the stack after the runtime code
  and it grows back towards the code. x86 is little endian
  so bigger memory address means more significant byte.


  Apparently an x86 bios automatically initialises a 512 (one sector)
  stack immediately after the boot code sector. If you need
  more than this you have to initialise SS (stack segment register)
  and SP (the stack pointer register) to something sensible
  and useful.

INITIALIZING THE STACK ....

  * set up a stack after a bootloader
  -------------------
     mov ax, 07C0h    ; Set up 4K stack space after this bootloader
     add ax, 288      ; (4096 + 512) / 16 bytes per paragraph
     mov ss, ax
     mov sp, 4096
  ,,,

POP INSTRUCTIONS ....

  pops a 16 or 32 bit data item from the stack (depending on 
  the 'address size attribute', e.g. BITS 16 in the assembler...)

  * pop straight into the destination index reg
  >> pop di

  * pop the 2 bytes at top of stack and place in CX register 
  >> pop cx     (flags: none)

  * pop 2 bytes into memory pointed to by bx
  >> pop [bx]

  * pop a saved flags register into the flags register
  >>  popf

  * pop all registers 
  >> popa
  >> popad

PUSH 
 
  * push into memory
  >> push [bx]

TWO STACKS ....

  The x86 architecture includes one built in stack (accessed
  with the 'push' and 'pop' instructions). But it would be nice 
  to have another stack. For example to pass parameters to functions
  without having to worry about stack frames etc. This is 
  another forth idea. 


  * create a second stack
  ----------

  ; The push instructions
    sub edx, 4       ; Decrement the stack pointer one position (4 bytes)
    mov dword [edx], eax ; Store the value at the new location

  ; The pop instructions
    ;Popping 3 steps: Getting value, incrementing the stack, 
    ; and returning the value. We will return the value simply by leaving it in eax.

    mov eax, dword [edx] ; Load the value off of the stack
    add edx, 4        ; Increment the stack pointer one position (4 bytes)
    ; Leave the result in eax to return it
  ,,,

DATA STRUCTURES 

CODE WITH HEADER ....

  Code blocks (proceedures) can be given a header to describe
  the following code. This header can be used to provide a 
  description of the code. Compared to a forth style linked list
  dictionary, we see the extra maintenance involved in maintaining
  a 'jump-table' of pointers to the beginning of each function

  * code with some header text 
  ---------
   BITS 16
   [ORG 0]
    cr     equ  13         ; carriage return
    lf     equ  10         ; line feed
    bell   equ   7         ; bell (sort of)

    jmp 07C0h:start         ; Goto segment 07C0
    jumptable dw asc,beep,reboot,colours,help

    asc:
      db 11, 'ascii chars'   ; function name in text 
      mov cx, 0x00FF
    asc.again:
      mov al, cl   ; print ascii char in CL register 
      mov ah, 0eH   
      int 10H
      and al, 0x0F ; using 'AND' with 'CMP' to 
      cmp al, 0x0F ; create a simple modulus test
      jne asc.ll
      mov al, cr  ; print chars 16 to a line
      int 10H
      mov al, lf 
      int 10H
    asc.ll:
      loop asc.again
      nop
      ret
    beep:
      db 4, 'beep'
      mov al, bell   ; beep
      mov ah, 0eH   
      int 10H
      ret
    reboot:
      db 6, 'reboot'
      int 19h      ; a dodgy way to reboot the computer
    colours:
      db 7, 'colours'
      mov al, 'C'   ; print Colours ...
      mov ah, 0eH   
      int 10H
      ret
    help:
    ; the code below has a problem with the CX index used
    ; is only printing 4 valid counts
      db 4, 'help'
      mov al, 'H'   ; print help ...
      mov ah, 0eH   
      int 10H
      mov cx, 5     ; loop through 5 functions
    help.again:
      mov di, jumptable
      add di, cx    ; do si:=si+cx*2 (jumptable is word cell)
      add di, cx
      mov si, [di]  ; di now points to start of function
      mov al, byte [si]  ; get the text count
      mov ah, 0eH
      add al, '0'  ; convert to ascii digit
      int 10H
      loop help.again  
      ret
    start:
      mov ax, cs   ; the code segment is magically correct
      mov ds, ax   ; establish data segment
      mov es, ax   ; do we need ES extended segment?
    .again:
      mov ah, 0eH   
      mov al, cr   ; print a prompt on a newline  
      int 10H
      mov al, lf 
      int 10H
      mov al, '?' 
      int 10H
      mov ah, 0     ; wait for keypress function
      int 16h
      cmp al, 'a'   ; check for valid command (a-c)
      jb .again     ; just print prompt again if invalid
      cmp al, 'e'
      ja .again
      sub al, 'a'   ; convert letter to a index into jump table
      mov bx, jumptable
      add bl, al     ; set bx:=bx+al*2
      add bl, al     ; set the pointer to point to code
      mov si, [bx]   ; si -> start of proceedure
      mov bl, byte [si]  ; jump over name of proceedure
      ; or we could print the function name here
      inc bl         ; the first code byte
      sub bh, bh     ; set bh := 0
      add si, bx
      call si  
      jmp .again          ; loop back to prompt
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

LINKED LISTS ....

  Linked lists can be implemented easily using the syntax of 
  the assembler itself (nasm style syntax)

  * a linked list in assembler
  -----------
    liststart dw '0' 
    w1 dw liststart
       db 3, 'egg'
    w2 dw w1
       db 5, 'water' 
    w3 dw w2 
       db 4, 'tree'
  ,,,

  * another layout 
  -----------------
    nip  dw 0        ; 1st word has a zero link
        db 3, 'nip'  ; strings are 'counted' 
    egg dw nip       ; link to previous dictionary entry 
        db 3, 'egg'  ; 
    bat dw egg       ; link to previous dictionary entry 
        db 3, 'bat'  ; 
    last dw bat      ; 
  ,,,


 Nasm can also handle forward references to labels, so the 
 dictionary could be written the with the reverse order.

NUMBERS 

DISPLAYING NUMBERS ....

  Even the task of displaying a number in assembler is 
  a non-trivial task. The basic idea is to divide repeatedly 
  by the base in which one is displaying the number (eg 10 for 
  decimal), and use the remainders or the division. But the 
  remainders must be reversed for printing...

  * display one byte number in binary format
   -----------
  [ORG 0]
   jmp 07C0h:start         ; Goto segment 07C0
   dotbin.doc: 
      db 'displays a 1 byte number in binary format', 13, 10
      db 'eg: 3 .bin   displays 00000101 '
      dw $-dotbin.doc
   ; stack: n --
   dotbin:
      dw 0          ; link to previous word or null at top of dict
      db 4, '.bin'  ; counted function name
   dotbin.x:
      pop dx        ; balance return fn pointer
      pop bx               ; bin number to print
      xor bh, bh           ; only 8 bits
      push dx              ; restore return IP
      mov cx, 8            ; number of bits to display 
      mov dl, 0b10000000   ; scan bit pattern
    .again:
      mov ah, 0eH     ; bios teletype
      test bl, dl     ; see if bit is set
      jnz .one
    .zero:
      mov al, '0'     ; print zero
      jmp .print
    .one:
      mov al, '1'     ; char to print 
    .print:
      int 10H         ; x86 bios, ah=0eH type char
      ror dl, 1       ; move the bit pattern 1 to right 
      loop .again     ; go again
      ret

   start:

     mov ax, cs    ; make ds and es the same as cs code segment 
     mov ds, ax    ; data segment
     mov es, ax    
     push 0x0003
     call dotbin.x

   here: jmp here        
   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature

  ,,,,

  * print a 1 byte number in hexadecimal 
  -----------
    ; Tell the compiler that this is offset 0.
    ; It isn't offset 0, but it will be after the jump.
    [ORG 0]
       jmp 07C0h:start         ; Goto segment 07C0
    hextable db "0123456789ABCDEF"
    start:
      mov ax, cs   ; cs is 07C0 after the far jump
      mov ds, ax   ; point data segment -> code segment
      mov ah, 0x0E       ; bios int 10H teletype function
      mov bx, hextable   ; translation table
      mov dl, 0x1F       ; the number to print
      mov cx, 2          ; number of digits to print
      .again
        rol dl, 4
        mov al, dl
        and al, 0x0F   ; use only low 4 bits (hex digit)
        xlatb          ; replace al with hex digit
        int 10H        ; invoke bios to print
        loop .again
    hang: jmp hang
    times 510-($-$$) db 0
    dw 0AA55h
  ,,,

  * print a 2 byte number in hexadecimal 
  -----------
    ; Tell the compiler that this is offset 0.
    ; It isn't offset 0, but it will be after the jump.
    [ORG 0]
       jmp 07C0h:start         ; Goto segment 07C0
    hextable db "0123456789ABCDEF"
    start:
      mov ax, cs   ; cs is 07C0 after the far jump
      mov ds, ax   ; point data segment -> code segment
      mov ah, 0x0E  
      mov bx, hextable   ; translation table
      mov dx, 0xABCD     ; the number to print
      mov cx, 4
      .again
        rol dx, 4
        mov al, dl
        and al, 0x0F
        xlatb              ; replace al with hex digit
        int 10H
        loop .again
    hang: jmp hang
    times 510-($-$$) db 0
    dw 0AA55h
  ,,,


CONVERTING NUMBERS FROM ASCII ....


  The code below acts as an interactive decimal to hex converter.
  The user enters a number in decimal and the program displays it
  in hex. The program prints '!' if the number is too big to store
  in 2 bytes.

  * convert positive decimal number entered by user to integer 
  ---------
   BITS 16
   [ORG 0]
    cr equ 13   ;  carriage return
    lf equ 13   ;  carriage return

    jmp 07C0h:start     ; Goto segment 07C0
    result dw 0x0000    ; somewhere to store converted number
    newline.h db 'prints a newline',13,10,0
    newline:
      mov ah, 0eH   ; bios teletype function 
      mov al, 13   
      int 10H       ; invoke bios
      mov al, 10   
      int 10H       ; invoke bios
      ret
    hextable db "0123456789ABCDEF"    ; translation table
    puthex.h db 'puthex: displays a 2 byte number in hex format',13,10,0
    puthex:
      pop bx     ; return address
      pop dx     ; the number to print (parameter on stack)
      push bx    ; restore return address
      mov ah, 0x0E ; bios teletype function 
      mov bx, hextable   ; translation table
      mov cx, 4          ; number of digits to print
      .again:
        rol dx, 4      ; rotate left 4 bits (print highest first)
        mov al, dl     ; bits to convert to hex digit
        and al, 0x0F   ; only lower 4 bits relevant
        xlatb          ; replace al with hex digit in translation table
        int 10H        ; invoke bios print function
        loop .again
      ret

    start:
      mov ax, cs    ; initialize the data segment register DS
      mov ds, ax
      mov al, '>'   ; print a prompt
      mov ah, 0eH   ; bios teletype 
      int 10H
    .again:
      mov ah, 0     ; bios wait for keypress function
      int 16h       ; invoke bios
      mov ah, 0eH   ; echo the keypress
      int 10H
      cmp al, '0'   ; check for valid digit (a-c)
      jb .display   ; if ascii value is less than '0' not digit
      cmp al, '9'
      ja .display   ; if ascii value greater than '9' not digit
      sub ah, ah    ; set ah = 0
      sub al, '0'   ; convert digit to ascii
      push ax       ; store digit on stack
      mov ax, [result]
      mov bx, 10   ; multiply by 10 (for decimal numbers)
      mul bx       ; do AX x BX and store in DX:AX 
      pop bx       ; get digit from stack
      jo .toobig   ; result too big to store in AX 
      add ax, bx
      mov [result], ax
    jmp .again          ; loop- get more digits
    .display:
      call newline
      mov ax, [result]  ; result to print 
      push ax           ; pass number to function
      call puthex       ; display number as hex
      ; also display ascii character here ....
      mov al, 'H'   ; print H to indicate hex output 
      mov ah, 0eH   ; bios teletype function 
      int 10H       ; invoke bios
      call newline
      mov word [result], 0x0000  ; set result = 0

      mov al, '>'   ; print a prompt
      mov ah, 0eH   ; bios teletype 
      int 10H
    jmp .again              ; loop- get a new number
    .toobig:  
      mov al, '!'   ; print ! if integer is too big for 2 bytes
      mov ah, 0eH   ; bios teletype function 
      int 10H       ; invoke bios
      mov word [result], 0x0000  ; set result = 0
      call newline
    jmp .again
  here:    jmp here         ; loop forever 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

ARITHMETIC

  Simple mathemetical operations require special care in assembly
  language because of the need to check for 'overflow' or 
  'carry' conditions (where the result is to large to fit into
  the target register

NEGATIVE NUMBERS ....

  The 2s complement format is used for negative numbers and the msb or 
  most significant bit is set to 1 if negative.

  see "printvectors" for a routine to print +/- 2 digit numbers
  
  "cbw" converts a signed byte to a signed word. This is important

  * check if a number is negative
  ----
    cmp eax, 0
    jl isNegative
  ,,,

  * another way to check if negative (msb is set)
  ------
    test eax, 0x80000000
    jne is_signed
  ,,,


  * check that msb is set 
  --------
    test eax, eax
    js signed
  ,,,

  * negate a register
  >> neg ax
  >> xor eax,eax; sub eax,edx ; another worse way to negate

  * display a negative number
  ---------
    print '-'
    neg ax
    print ax
  ,,,

  jg, jge, jl, jle are signed jumps
  js, jump if signed
  jns jump if not signed

GOTCHAS FOR NEGATIVE NUMBERS ....

  if dl == -99 and dh == 0 then dx will NOT be == to -99
  That is because the sign bit (msb) on dx is not set.
  The solution is, just use cbw to convert the signed byte
  to a signed word

ADD INSTRUCTION ....

   It is legal to add directly to the destination index register [di]
   which is hand for writing pixels to memory, strings etc

   * increment di by 4
   >> add di, 4
 
   * negative increments are legal and good and handy
   >> add di, -1 (same as 'sub di, 1')

   * add an indexed memory location
   >> add di, [table + bx]

   * we can use add to multiple
   ---------
      mov cx, 4
    .again:
      add ax, 5 
      loop .again
   ,,,,

DIGITS ....

  * check if a number entered is an ascii digit
  -------
  start:
  .again:
    sub ax, ax
    mov ah,0    ; wait for a key press 
    int 16h     ; bios interrupt service 
    cmp al, '0' ;
    jb .notdigit  ; if ascii value is less than '0' not digit
    cmp al, '9'
    ja .notdigit  ; if ascii value greater than '9' not digit
    mov ah, 0eh   ; print the digit if it is one
    int 10h       ; bios print routine
    jmp .again 
  .notdigit:
    mov ah, 0eh   ; print 'N' if its not a digit
    mov al, 'N'  
    int 10h
    jmp .again
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
    
  ,,,

MULTIPLICATION ....

  The MUL instruction is used to multiply. 
  8 or 16 bit: 11 clock cycles
  32 bit: 10 clock cycles

  * multiply AL x BL and store result in AX 
  >> mul BL     (flags: CF, OF cleared if AH zero, otherwise set)

  * multiply AX x DX and store result in DX:AX
  >> mul DX     (flags: CF, OF cleared if DX zero, otherwise set)

  * multiply EAX x ECX and store result in EDX:EAX
  >> mul CX     (flags: CF, OF cleared if DX zero, otherwise set)

DIVISION ....
  
  the x86 instruction set has a special 'div' instruction
  for performing division. Another method is to perform
  repeated subtraction.

  AX/[8 bit register] -> quotient in AL, remainder in AH

  DX AX/[16 bit register] -> quotient in AX, remainder in DX

  EDX EAX/[32 bit register] -> quotient in AX, remainder in DX

  * divide 23/10 and print quotient and remainder 
  ---------------------------------------------------------
  start:
    mov ax, 07C0h    ; Set up 4K stack space after this bootloader
    add ax, 288      ; (4096 + 512) / 16 bytes per paragraph
    mov ss, ax
    mov sp, 4096
    mov ax, 07C0h    ; Set data segment to where we're loaded
    mov ds, ax

    mov ax, 23
    mov bl, 10
    div bl       ; quotient -> al, remainder -> ah
    add al, '0'  ; convert digit in al to ascii 
    call printc  ; print the quotient in AL
    mov al, 'r'
    call printc  ; print a separator character
    mov al, ah 
    add al, '0'  ; convert digit to ascii  
    call printc  ; print the remainder (from AH)
    jmp $        ; keep looping! 
   
  ; routine to output character in AL to screen
  printc:     
     push ax
     mov ah, 0Eh      ; int 10h 'print char' function
     cmp al, 32       ; could modify to check for ascii range  
     int 10h          ; call bios function  
     pop ax
     ret

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

  note that the dividend (AX) must be 16 bits for an 8 bit 
  divisor, so check that the correct data type is loaded.

  The code below only works if the quotient and remainder are
  single digits.

  * divide a number by another and print quotient and remainder 
  ---------------------------------------------------------
  start:
    mov ax, 07C0h    ; Set up 4K stack space after this bootloader
    add ax, 288      ; (4096 + 512) / 16 bytes per paragraph
    mov ss, ax
    mov sp, 4096
    mov ax, 07C0h    ; Set data segment to where we're loaded
    mov ds, ax

    mov AX, [dividend] 
    mov BL, [divisor] 
    div BL
    push AX       ; save AX so we can get the remainder later
    add AL, '0'   ; convert to ascii, but only one digit!
    mov AH, 0Eh   ; print quotient
    int 10h
    mov AL, 'r'   ; print separator character
    mov ah, 0eh
    int 10h
    pop AX
    mov AL, AH 
    add AL, '0'   ; convert to ascii 
    mov ah, 0eh   ; print remainder
    int 10h
    jmp $         ; keep looping! 
   
    dividend dw 79 
    divisor dw 11 

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature

  ,,,

GOTCHAS FOR DIVISION ....

   * a strange bug, if I dont xor ah,ah the whole program crashes in qemu
   -------
     mov bl, 10     ; divide by 10
     xor ah, ah     ; this is necessary !! not sure why
     div bl         ; do ax/bl, ah=remainder, al=quotient
   ,,,

DIVISION INSTRUCTIONS ....

  == summary of division instructions
  .. div - unsigned division
  .. idiv - signed division
  .. shr - integer division by powers of 2
  ,,,

  div cx - seems to divide ax by cx and leave the remainder/quotient
           in dx (???)

MODULUS ....

  The modulus operation can be performed by used the 
  'div' instruction, and then taking the value in the AH
  register which is the 'remainder' from a division operation.

  If the modulus is of a number which is a power of 2 (2,4,8,16 ...)
  we can obtain the modulus by ANDing the right number of 
  high bits in the number. This should be faster than using 
  the DIV instruction

  If we perform modulus with test, or and/cmp then the modulus needs
  to be a 2^n I think.
  
  * modulus performed with the 'test' instruction
  -----------------
   jmp start
   start:
      mov cx, 0x00FF
      mov ah, 0x0E   
    .again:
      mov al, cl   ; print ascii char in CL register 
      int 10H
      test al, 0x0F
      jne .asc.ll   ; or jz .asc.ll
      mov al, 13  ; print chars 16 to a line
      int 10H
      mov al, 10 
      int 10H
    .asc.ll: 
      loop .again
    jmp $         ; loop forever

    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  The modulus can also be performed with AND and CMP instructions

  * modulus performed with 'and' and 'cmp'
  -----------------
   jmp start
   start:
      mov cx, 0x00FF
    .again:
      mov al, cl   ; print ascii char in CL register 
      mov ah, 0x0E   
      int 10H
      and al, 0x0F ; using 'AND' with 'CMP' to 
      cmp al, 0x0F ; create a simple modulus test
      ; or test al, 0x0F ???
      jne .asc.ll
      mov al, 13  ; print chars 16 to a line
      int 10H
      mov al, 10 
      int 10H
    .asc.ll: 
      loop .again
    jmp $         ; loop forever

    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

 
RANDOM NUMBERS ....

   * how to generate a sort of random number from clock ticks
   ------------
   RANDGEN:         ; generate a rand no using the system time
   RANDSTART:

     mov ah, 00h  ; interrupts to get system time        
     int 1Ah      ; cx:dx now holds number of clock ticks since midnight      
     mov  ax, dx
     xor  dx, dx
     mov  cx, 10    
     div  cx    ; here dx contains the remainder of the division - from 0 to 9

     add  dl, '0'  ; to ascii from '0' to '9'
     mov ah, 2h   ; call interrupt to display a value in DL
     int 21h    
     RET    
  ,,,

  Multiply by big prime,then add other big prime, then
  div to reduce to given number range.

  * a complete random number example 
  -------------
   [ORG 0]
   jmp 07C0h:start        ; start label in segment 07C0

  rand.doc:
    db 'Gives a kind of random number between 1 and 10 generated'
    db 'from the number of clock ticks since midnight. Unfortunately'
    db 'this isnt useful for generating a sequence of random numbers'
    dw $-rand.doc
  rand:
    dw 0           ; link to previous dict word or null
    db 4, 'rand'   ; forth counted name
  rand.x:
    mov ah, 00h  ; interrupts to get system time        
    int 1Ah      ; cx:dx now holds number of clock ticks since midnight      
    mov  ax, dx
    xor  dx, dx
    mov  cx, 10    
    div  cx     ; here dx contains the remainder of the division - from 0 to 9
    pop bx     ; balance return pointer
    push dx    ; push result 0-9 on stack
    push bx    ; restore return pointer
  .exit:
    ret

  start:
    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax     ; print with stosw


    mov cx, 10
  .again:
    push cx
    call rand.x
    pop ax       ; the result 0-9
    mov ah, 0x0e ; print char func
    add al, '0'  ; convert to asci digit
    int 10h
    pop cx
    loop .again

    jmp $          ; loop forever
    times 510-($-$$) db 0   
    dw 0xAA55              
  ,,,
 
  * generate pseudo random number
  >> RNG = (69069*RNG + 69069) MOD 2^32

  * xor shift pseudo random number generator
  -------------
    mov ax, num
    mov bx, ax
    shl bx, 3
    xor ax, bx       ; ax is now next pseudo random number 
  ,,,

  * try to generate some pseudo random numbers 
  -------------
   [ORG 0]
   jmp 07C0h:start        ; start label in segment 07C0

  ; **
  rgen.doc:
    db 'generate a pseudo random number between 0 and 9 '
    db ' using the xorshift technique'
    dw $-rgen.doc
  rgen:
    dw 0           ; link to previous dict word or null
    db 4, 'rgen'   ; forth counted name
  rgen.x:
    pop dx
    pop ax       ; previous random
    push dx      ; restore fn pointer
    ;mov ah, 00h  ; interrupts to get system time        
    ;int 1Ah      ; cx:dx now holds number of clock ticks since midnight      

    mov bx, ax
    shl bx, 3
    xor ax, bx

    ;mov  ax, dx
    xor  dx, dx
    mov  cx, 10    
    div  cx     ; here dx contains the remainder of the division - from 0 to 9
    pop bx     ; balance return pointer
    push dx    ; push result 0-9 on stack
    push bx    ; restore return pointer
  .exit:
    ret
  ; *

  start:
    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax     ; print with stosw


    push 123
    call rgen.x
    mov cx, 10
  .again:
    push cx
     
    call rgen.x
    pop ax       ; the result 0-9
    mov ah, 0x0e ; print char func
    add al, '0'  ; convert to asci digit
    int 10h
    pop cx
    loop .again

    jmp $          ; loop forever
    times 510-($-$$) db 0   
    dw 0xAA55              
  ,,,
 
MATHEMATICS 

SQUARE ROOT ....

  The square root is approximated using newtons method.
  Another method is to observe that all square numbers are
  the sum of odd consecutive odd numbers eg:
    1 + 3 + 5 = 3^2

  * time consuming way to find floor of square root 
  --------
    mov ax, num
    mov cx, 1
  .again:
    sub ax, cx
    cmp ax, 0
    jb .exit       ; need to debug this logic
    add cx, 2
    jmp .again
  .exit:
    add cx, 1
    shr cx, 1      ; do cx := (cx+1)/2
    ; cx is now floor of square root
  ,,,,

  But for large numbers this is going to be very slow. Newtons
  method is much faster

POWERS OF TWO ....

  We can get powers of 2 relatively easily and quickly by
  bit shifting left

  * print in decimal powers of 2
  -----------
        
  ,,,

GEOMETRY


   * get absolute value of difference
   ---------
     sub eax, edx
     cdq
     xor eax, edx
     sub eax, edx
   ,,,

  print pixels in a circle. square the distance then check.

  * show distance between 2 graphics points (x1,y2) (x2,y2)
  -------------
   [ORG 0]
   jmp 07C0h:start        ; start label in segment 07C0

  dist.doc:
    db 'Gives the distance between 2 points using the formula '
    db '  sqrt((x1-x2)^2 + (y1-y2)^2) '
    db ' [stack: x1, y1, x2, y2 :tos -- returns: |distance| ] '
    dw $-dist.doc
  dist:
    dw 0           ; link to previous dict word or null
    db 4, 'dist' ; forth counted name
  dist.x:

    pop dx         ; balance return ip
    pop ax         ; x1 coord
    pop bx         ; y1 coordinate
    pop cx         ; x2 coord 
    sub ax, cx     ; x1-x2 
    pop cx         ; y2 coord
    sub bx, cx     ; y1-y2 
    push dx        ; restore fn pointer

    mov dx, ax
    mul dx         ; ax^2
    mov ax, dx     ; save ax*ax to dx
    mov ax, bx
    mul bx         ; ax:=bx^2
    add ax, dx     ; |x|^2 + |y|^2 
    
    pop dx       ; juggle return fn*
    push ax      ; save result on stack
    push dx      ; restore return fn*
  .exit:
    ret

  start:
    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov ax, 0xB800 ; for printing to screen 
    mov es, ax     ; print with stosw

    push 0         ; x1
    push 0         ; y1
    push 3         ; x2
    push 4         ; y2, 3,4,5 triangle
    call dist.x
    
    pop ax    ; result
    
    ; look for dothex for printing here: use direct memory 
    ; printing
    jmp $          ; loop forever
    times 510-($-$$) db 0   
    dw 0xAA55              
  ,,,
 
 
CONFIGURING VIDEO DISPLAY

  The video appears to have several 'display modes' which need to
  be set or configured.

  int 10h Get current video mode AH=0Fh  
  returns: AL = Video Mode, AH = number of character columns, 
    BH = active page

  * display the video mode number 
  -----------------------------------------------------
  jmp start
  start:
    mov ah, 0Fh 
    int 10h
    add al, '0' 
    mov ah, 0Eh
    int 10h
    jmp $         ; loop forever

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

  The code below prints '80' which is probably the standard
  screen character width in text mode.

  * display the number of character columns 
  -----------------------------------------------------
  jmp start
  start:
    mov ax, 07C0h  ; Set data segment to where we're loaded
    mov ds, ax
    mov ah, 0Fh    ; video mode info function
    int 10h        ; load al with character width
    mov al, ah     ; printi8 prints number in al register 
    mov bl, 10     ; printi8 uses bl register for base
    call printi8
    jmp $         ; loop forever

  %include 'printi8.asm'
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

VIDEO MODES

  http://brokenthorn.com/Resources/OSDevVid2.html
    good info

  vga interface = low resolution, usually <= 16 colours (except 13h)
  vesa vbe interface = higher resolutions

  Note that mode 13h is the only one in standard vga which
  can display 256 colours.

  * set a video mode
  >> INT 10h, AH=0, AL=<video-mode>
  
  == Standard vga colour modes
  Mode Resolution Color depth 
  AL=0h 40x25 Text 16 Color
  AL=1h 40x25 Text 16 Color
  AL=2h 80x25 Text 16 Color 
  3h 80x25 Text 16 Color (default text mode on boot-up)
  4h 320x200 4 Color
  5h 320x200 4 Gray 
  7h 80x25 Text 2 Color 
  Dh 320x200 16 Color
  Eh 640x200 16 Color
  Fh 640x350 2 Color
  10h 640x350 16 Color 
  11h 640x480 2 Color
  12h 640x480 16 Color
  13h 320x200 256 Color  (a common simple graphics mode)
  6Ah 800x600 16 color (higher resolution) 
  ,,,

  Basically in text mode its seems impossible to draw pixels
  and visa versa.

   * set the mode to 0 for bigger text 
   --------------------
   start:
     mov ax, 07C0h
     mov ds, ax
   .setmode:
     mov     ah, 0     ; set graphics display mode function.
     mov     al, 0h    ; mode 0h = text 40x25 
     int     10h       ; set it!
   .text:
     mov ah, 0eh
     mov al, 'Q'
     int 10h
    hang: jmp hang

    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  Set the high bit of AL to not clear screen when changing
  video mode.

  * set video mode to 3 (colour text) without clearing screen
  -------
  start:
    mov ah, 0eh    ; teletype 'P'
    mov al, 'P'
    int 10h
    mov ah, 0
    int 16h        ; wait for a key press
  .setmode:
    mov ah, 0
    mov al, 10000000b   ; 0 + high bit set to not clear screen
    int 10h
  .print:
    mov ah, 0eh    ; teletype 'Q'
    mov al, 'Q'    ; the 'Q' gets printed in cursor position 0,0
    int 10h        ; and overwrites whatever was there
  hang: jmp hang

   times 510-($-$$) db 0 
   dw 0xAA55              
  ,,,
  

  Garbage is displayed on the screen with the code below

  * switch to a graphics mode without clearing screen, not useful
  -------
  start:
    mov ah, 0eh    ; teletype 'P'
    mov al, 'P'
    int 10h
    mov ah, 0
    int 16h        ; wait for a key press
  .setmode:
    mov ah, 0
    mov al, 13h        ; graphics mode 320x200
    or al, 10000000b   ; set the high bit to 1 
    int 10h
  .print:
  hang: jmp hang

   times 510-($-$$) db 0 
   dw 0xAA55              
  ,,,
  
GRAPHICS VIDEO MODES ....

  http://wiki.osdev.org/Drawing_In_Protected_Mode
    see this

  mode 12h has resolution 640x320 which is not bad, but
   only 16 colours, where as
  mod 13h is 320x200, 256 colours

  Text *can* be printed in graphics video modes !!!

    ; set video to big text 40x25
    vid:
      dw asc
      db 3, 'vid'
    vid.x: 
      mov ah, 0     ; set graphics display mode function.
      mov al, 1h    ; mode 0h = text 40x25 
      int 10h       ; set it!
      ret

WRITING OUTPUT TO THE SCREEN

 Write a character at the current cursor position       
  int 10h, ah=0ah  al=character, bh=page number, 
  cx=number of times to print the character

  == int 10h character display functions (value in register ah)
  .. 0eh - teletype, the cursor is advanced after printing
  .. 0ah - print character at x,y position with colour
  ..

  * print a character using the 'teletype' function (0eh)
  ---------------------------------------------------------
  start:
    mov al, '*'
    mov ah, 0eH
    int 10H
    jmp $             ; keep looping! 

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

  For a great deal of time I assumed that it wasnt possible
  to print text in graphic video modes... not true!! But 
  the BL register must be set to the required colour other
  wise it seems to default to black, meaning that the character
  cannot be seen.

  * print a coloured character in graphic mode 13h
  ---------------------------------------------------------
  start:
    mov ah, 0         ; clear the screen
    mov al, 13h      
    int 10H
    mov ah, 0eH
    mov al, '*'
    mov bl, 8      ; the colour of the character
    int 10H
    mov bl, 7      ; the colour of the character
    int 10H
    jmp $             ; keep looping! 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

  The newline (13) goes to start of current line.
  linefeed (10) goes to a new line. So 13, 10 works as expected

  * print a character with 'teletype' and a newline 
  ---------------------------------------------------------
  start:
    mov al, '*'
    mov ah, 0eH
    int 10H
    mov al, 13 
    mov ah, 0eH
    int 10H
    mov al, 10 
    mov ah, 0eH
    int 10H
    jmp $             ; keep looping! 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

  * print a 10 stars down the screen 
  ---------------------------------------------------------
  start:
  mov cx, 30
  .again:
    mov al, '*'
    mov ah, 0eH
    int 10H
    mov al, 13 
    ;mov ah, 0eH
    int 10H
    mov al, 10 
    ;mov ah, 0eH
    int 10H
    loop .again
    jmp $             ; keep looping! 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

 * print "!!!" at the current cursor position
 -----------------------------------------------------
  start:
    mov ah, 0aH           
    mov al, '!'         ;
    mov cx, 3           ;
    int 10h             ;
    jmp $               ; Jump here - infinite loop!

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
 ,,,,


  * notes for how to clear the screen
  --------------------------

   start:
     mov     ah,06h   ; ah=function number for int10 (06)
     mov     al,00h   ; al=number of lines to scroll (00=clear screen)
     mov     bx,700h  ; bh=color attribute for new lines
     xor     cx,cx    ; ch=upper left hand line number of window (dec)
     ; cl=upper left hand column number of window (dec)
     mov     dx,184fh ; dh=low right hand line number of window (dec)
     ; dl=low right hand column number of window (dec)
     int     10h
    jmp $               ; Jump here - infinite loop!

    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature

  ,,,

OUTPUT TEXT WITHOUT BIOS FUNCTIONS ....

   It is not difficult to print text without using the bios int 10h functions.
   You just write the ascii char to the memory location starting at 0xB8000 and
   colour attribute afterwards. But is this possible in all video modes? or
   only text modes?

   So 0xB8000 is column 0, row 0 
      0xB8001 is column 0, row 0, colour byte |IRGB IRGB|  (background forground)
      0xB8002 is colomn 1, row 0 
      0xB8001 is column 1, row 0, colour byte |IRGB IRGB|  (background forground)
      etc
      0xB8000+160d is column 0, row 1
      0xB8000+161d is column 0, row 1, colour byte

  * print a character to the screen using video memory 
  ------------------
   BITS 16
   [ORG 0]
    jmp 07C0h:start     ; Goto segment 07C0

    char.doc:
        db 'Prints one character to screen using video memory'
        dw $-char.doc
    char:
    char.x:
       mov ax, 0xB800
       mov fs, ax               ; fs -> start of video memory 
       mov [fs:4], byte '#'         ; the char to print
       mov [fs:5], byte 0b00101001  ; blue on green 
       ret
   start:
    mov ax, cs
    mov ds, ax
    mov es, ax 

    mov ah, 0         ; clear the screen
    mov al, 13h      
    ;int 10H

    call char.x

    jmp $                   ; loop forever or hlt ?
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature

  ,,,


  We cannot copy byte character strings directly to video memory because
  each character in video memory has a colour bit

  * print a string by writing to video memory
  ------------------
   BITS 16
   [ORG 0]
    jmp 07C0h:start     ; Goto segment 07C0

    buffer db 33, 'abcdefghijklmnopqrstuvwxyz01234567890'
    string.doc:
       db 'Prints a string to screen using video memory'
       dw $-string.doc
    string:
    string.x:
       push es               ; save es pointer
       mov ax, 0xB800
       mov es, ax            ; es -> start of video memory 
       mov si, buffer+1      ; string to print 
       mov di, 100           ; where on screen to print

       ; rep movsb doesnt work because of colour bytes 
       ; [high byte= colour, low byte = character]
       ; but can do stosb char then stosb colour
       xor cx, cx         ; count is only one byte
       mov cl, [buffer]   ; how many chars to print (counted string)
       cld                ; make lodsb and stosw step forwards 
     .nextchar
       lodsb                ; al := ds:si++ 
       ; mov ah, 0b00101001   ; colour blue on green  
       mov ah, cl           ; multi colour
       and ah, 0x0F         ; only 16 forground colours 
       stosw                ; es:di++ := ax, al==char, ah==colour
       loop .nextchar       ; do it cx times

       ;mov [es:4], byte '#'         ; the char to print
       ;mov [es:5], byte 0b00101001  ; blue on green 

       pop es                ; restore es pointer
       ret

   start:
    mov ax, cs
    mov ds, ax
    mov es, ax 

    mov ah, 0         ; clear the screen
    mov al, 13h      
    ;int 10H

    call string.x

    jmp $                   ; loop forever or hlt ?
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature

   ,,,,

  The code below relies on the trick that the ascii char '/' is
  also a colour code (white on light green). This code gets
  written to the high byte after the actual character to display

  * a trick to print a string directly to memory 
  ------------------
   BITS 16
   [ORG 0]
    jmp 07C0h:start     ; Goto segment 07C0

    buffer db start-$, 't/h/i/s/ /a/ /t/e/s/t/'

    start:
       mov ax, cs
       mov ds, ax

       mov ax, 0xB800
       mov es, ax         ; es -> start of video memory 
       mov si, buffer+1   ; string to print (skip count)
       mov di, 100        ; where on screen to print (50th char on screen)
       cld                ; make movsb step forwards
       xor cx, cx         ; string count is one byte: ie set ah := 0
       mov cl, [buffer]   ; how many chars to print
       rep movsb          ; copy string to video memory at 0xB8000 
                          ; white on green

    jmp $                   ; loop forever or hlt ?
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature

   ,,,,

  * fill the screen with colours using memory writes 
  ------------------
   BITS 16
   [ORG 0]
    jmp 07C0h:start     ; Goto segment 07C0

    patch.doc:
       db 'Fills the screen with colours'
       dw $-patch.doc
    patch:
    patch.x:
       push es            ; save es pointer
       mov ax, 0xB800
       mov es, ax         ; es -> start of video memory 
       mov di, 0x00       ; start at top left corner 
       cld                ; make stosw step forwards 

       ; rep movsb doesnt work because of colour bytes 
       ; [high byte= colour, low byte = character]
       mov cx, 0x0FFF     ; print lots of background colours
     .nextchar
       mov ax, cx           ; multi colour
       mov ah, al           ; colour in high byte
       shl ah, 4            ; colour in high nibble
       xor al, al           ; no character to print
       stosw                ; es:di++ := ax 
       loop .nextchar       ; do it cx times
       pop es               ; restore es pointer
       ret

   start:
    mov ax, cs
    mov ds, ax
    mov es, ax 

    call patch.x

    jmp $                   ; loop forever or hlt ?
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature

   ,,,,

   * c example of print to screen without bios int 10h
   ---------------------------------------------------
   // note this example will always write to the top
   // line of the screen
   void write_string( int colour, const char *string )
   {
      volatile char *video = (volatile char*)0xB8000;
      while( *string != 0 ) {
        *video++ = *string++;
        *video++ = colour;
      }
    }
   ,,,,

CURSOR

  Get Cursor position Bios function int 10H
    AH=03h,DL=Cursor-column,DH=Cursor-row 

  Set Cursor position Bios function int 10H
    AH=02h,DL=Cursor-column,DH=Cursor-row 

  * increment the cursor column position
  -------------------------------
  start:
    ; mov bh, 00h  ; assume page 0
    mov ah, 03h  ; bios function: get cursor position into dx  
    int 10h      ; invoke bios
    mov ah, 02h  ; bios function: set cursor position specified in dx
    inc dl       ; increment cursor column by 1
    int 10h      ; invoke bios
    jmp $               ; Jump here - infinite loop!
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,


  * increment the cursor row and line position 10 times
  -------------------------------
  start:
   mov cx, 10    ; the loop counter
  .again:
    push cx
    mov al, '*'
    mov ah, 0eH
    int 10H
    ; why mov bh???
    ;mov bh, 00h  ; get cursor position into dx  (int 10h, ah=03h)
    mov ah, 03h  ; bios get cursor row:col into DH:DL
    int 10h      ; invoke bios
    mov bh, 00h  ; not sure if necessary ??
    mov ah, 02h  ; set cursor position specified in dx
    inc dl       
    inc dh       
    int 10h
    pop cx
    loop .again 

    jmp $               ; Jump here - infinite loop!
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature

  ,,,

MOVING THE CURSOR ....
 
  * a simple example
  ------------
    mov ah, 0      ; bios read key code function
    int 16H        ; invoke bios
    cmp al, 0      ; is extended char (AL != 0)  ?
    jne .printkey
    cmp ah, 75       ; left arrow
    je .leftarrow 
    cmp ah, 77       ; right arrow
    je .rightarrow 
  ,,,

  * move the cursor right if right arrow pressed
  -------------------------------
  jmp start
  start:
  .again:
    mov ah, 0      ; bios read key code function
    int 16H        ; invoke bios
    cmp al, 0      ; is extended char (AL != 0)  ?
    jne .again     ; wait for next key if not ->
    cmp ah, 77     ; right arrow
    jne .again     ; wait for next key if not -> arrow key
    mov ah, 03h  ; get cursor position into dx  (int 10h, ah=03h)
    int 10h      ; invoke bios 
    mov ah, 02h  ; bios function: set cursor position specified in dx
    inc dl       ; increment column position
    int 10h      ; invoke bios
    jmp .again 
    jmp $        ; program hangs here 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
 ,,,

  The code below could be greatly simplified by having a mov cursor
  procedure ?

  * move the cursor if arrow keys pressed
  -------------------------------
  jmp start
  start:
  .again:
    mov ah, 0      ; bios read key code function
    int 16H        ; invoke bios
    cmp al, 0      ; is extended char (AL != 0)  ?
    jne .again     ; wait for next key if not extended char
    cmp ah, 75     ; left arrow
    je .moveleft  ;  
    cmp ah, 77     ; right arrow
    je .moveright  ;  
    cmp ah, 80     ; down arrow
    je .movedown    
    cmp ah, 72     ; up arrow
    je .moveup    
    jmp .again
  .moveleft:
    mov ah, 03h  ; get cursor position into dx  (int 10h, ah=03h)
    int 10h      ; invoke bios 
    mov ah, 02h  ; bios function: set cursor position specified in dx
    dec dl       ; decrement column position
    int 10h      ; invoke bios
    jmp .again 
  .moveright:
    mov ah, 03h  ; get cursor position into dx  (int 10h, ah=03h)
    int 10h      ; invoke bios 
    mov ah, 02h  ; bios function: set cursor position specified in dx
    inc dl       ; increment column position
    int 10h      ; invoke bios
    jmp .again 
  .movedown:
    mov ah, 03h  ; get cursor position into dx  (int 10h, ah=03h)
    int 10h      ; invoke bios 
    mov ah, 02h  ; bios function: set cursor position specified in dx
    inc dh       ; increment row position
    int 10h      ; invoke bios
    jmp .again 
  .moveup:
    mov ah, 03h  ; get cursor position into dx  (int 10h, ah=03h)
    int 10h      ; invoke bios 
    mov ah, 02h  ; bios function: set cursor position specified in dx
    dec dh       ; decrement row position
    int 10h      ; invoke bios
    jmp .again 
    jmp $        ; program hangs here 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
 ,,,

  Check if this cursor positioning code works with a real bios
  (not just an emulator)!

  * echo chars and print ascii code in hexadecimal at bottom of screen 
  ---------------------------------------------------------
   BITS 16
   [ORG 0]

    sigma equ 228       ; greek capital sigma
    jmp 07C0h:start     ; Goto segment 07C0

    hextable db "0123456789ABCDEF"
    key.al db 0       ; AL keycode left by int 0x16, al=0x00 
    key.ah db 0       ; AH keycode left by int 0x16, al=0x00 

  ; a structured document format, the last string in the 
  ; data structure is zero terminated to indicate end of documentation

  keycode.doc:             ; one line summary for function
    db keycode.see-$       ; string count
    db 'Prints the ps/2 scancodes of keys pressed and released'
  keycode.see:                     ; related functions
    db keycode.eg-$                ; string count
    db 'scancode, colourcode', 0   ; zero terminated string
    dw $-keycode.doc
  keycode.eg:
    db keycode-$                   ; string count
    db 'keycode  /displays ascii codes interactively', 0
    dw $-keycode.doc     ; a link to the top of the documentation
  keycode:
    dw 0          ; link to previous
    dw 7, 'keycode'
  keycode.x:

  .begin
    mov ah, 0     ; wait for keypress function
    int 16h
    cmp al, 'q'   ; was the key press a 'q' 
    je .exit      ; bail if quit pressed
    mov [key.al], al  ; save AL key code in key buffer
    mov [key.ah], ah  ; save AL key code in key buffer
    mov ah, 0eH   ; bios teletype char function
    int 10H
    cmp al, 13        ; was the key press a 'enter' 
    jne .status
    mov al, 10    ; if a 'enter' is pressed add a newline
    int 10h
   ; save the cursor position, print something
   ; at the bottom of screen, then restore cursor
   .status:        
    mov ah, 03h   ; bios function DH:DL <- cursor y:x
    int 10h       ; invoke bios
    push dx       ; save cursor Row:Col (DH:DL) on stack
    mov dx, 0x1700 ; Row 23, Column 0
    mov ah, 02h    ; set cursor position specified in dx
    int 10h        ; 

    ; print the key code in hex format
    mov ah, 0x0E       ; x86 bios teletype function 

    mov al, 'A'        ; 
    int 0x10           ; 
    mov al, 'L'        ; 
    int 0x10           ; 
    mov al, '='        ; 
    int 0x10           ; invoke bios function
    mov al, '0'        ; print prefix 0x to indicate hexadecimal number
    int 0x10           ; invoke bios function
    mov al, 'x'        ;
    int 0x10
    mov cx, 2          ; print 2 nibbles ( 1 byte )
    mov bx, hextable   ; pointer to digit translation table
    mov dl, [key.al]   ; get key code from buffer
    .nextbyte:
      rol dl, 4      ; print first digit
      mov al, dl     ; get key code 
      and al, 0x0F   ; print high byte first
      xlatb          ; replace al with hex digit  al := [bx+al]
                     ; or use stosw for memory printing
                     ; with ah==colour
      int 10h        ; invoke bios 
      loop .nextbyte
    
    ; how about direct memory printing here
    mov al, ' '        ; print a space to separate al and ah results 
    int 0x10           
    mov al, 'A'        ; 
    int 0x10           
    mov al, 'H'        ; 
    int 0x10           ; invoke bios function
    mov al, '='        ; print prefix 0x to indicate hexadecimal number
    int 0x10           ; invoke bios function
    mov al, '0'        ; print prefix 0x to indicate hexadecimal number
    int 0x10           ; invoke bios function
    mov al, 'x'        ;
    int 0x10

    mov cx, 2          ; print 2 bytes
    mov bx, hextable   ; pointer to digit translation table
    mov dl, [key.ah]   ; get key code from buffer
    .nextbyte.ah:
      rol dl, 4      ; print first digit
      mov al, dl     ; get key code from buffer
      and al, 0x0F   ; print high byte first
      xlatb          ; replace al with hex digit  al := [bx+al]
      int 10h        ; invoke bios 
      loop .nextbyte.ah
    
    ; Also print the 'colour' of the ascii code. eg 0x7F is 
    ; intense white on grey (fg = low nibble, bg = high nibble)

    mov ah, 0x0E
    mov al, ' '        ; print a space to separate al and ah results 
    int 0x10           

    mov cx, 1         ; print 1 char only 
    mov ah, 0x09      ; bios function colour print 
    mov al, sigma     ; print the 'colour' of current code 
    mov bl, [key.al]  ; color bits: I back(R|G|B) I fore(R|G|B)
    int 0x10           

    ; If a special key is pressed (insert delete etc) then
    ; al will be 0. Escape is not a special key

    pop dx        ; restore text cursor position
    mov ah, 02h   ; set cursor position specified in dx
    int 10h       ; invoke bios
    jmp .begin    ; keep looping! 

  .exit:
    ret

  start:
    mov ax, cs        
    mov es, ax
    mov ds, ax
    call keycode.x
    jmp $

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

  * Code to show a hex byte is only 1 instruction longer than loop !!!
  --------------------------------------
      mov bx, hextable   ; translation table
      mov ah, 0eH    ; bios teletype function 
      mov al, [key]  ; get key code from buffer
      rol al, 4      ; print first digit
      and al, 0x0F   ; higher order digit
      xlatb          ; replace al with hex digit
      int 10h        ; invoke bios 
      mov al, [key]  ; get key code from buffer
      and al, 0x0F   ; lower order digit
      xlatb          ; replace al with hex digit
      int 10h        ; invoke bios 
  ,,,

KEYBOARD

PS2 ....

  understand http://wiki.osdev.org/PS2_Keyboard

  ps/2 Scancode tables: TODO: official specs?

  - http://flint.cs.yale.edu/cs422/doc/art-of-asm/pdf/APNDXC.PDF
  - https://en.wikipedia.org/wiki/Scancode

  One standard interface is the ps/2 keyboard

  probably need to use interrupts to do the following, otherwise
  this will block. Is a ps2 scancode only 8 bits? Key releases have the
  same code as corresponding keypress but highest bit is set (eg and 0x8000, key)

  * print hex scan codes every time key is pressed, released
  ------------------
   BITS 16
   [ORG 0]
    jmp 07C0h:start     ; Goto segment 07C0

    hextable db "0123456789ABCDEF"    ; translation table
    scancode.doc:
        db 'Prints the ps/2 scancodes of keys pressed and released'
        dw $-scancode.doc
    scancode:
    scancode.x:
     ; cli   ; necessary ??
     .loop:
       in al, 0x60    ; store ps/2 keyboard scancode to al
       cmp al, cl     ; is this a new code? 
       je .loop       ; 
       cmp al, 0x81   ; [escape] key release 
       je .exit       ; exit if escape key pressed
       mov cl, al

       push cx       ; save previous code to stack 

       ; print the hex code in al
       mov dl, al
       mov ah, 0x0E ; bios teletype function 
       mov bx, hextable   ; translation table
       mov cx, 2          ; number of digits to print
       .again:
         rol dl, 4      ; rotate left 4 bits (print highest first)
         mov al, dl     ; bits to convert to hex digit
         and al, 0x0F   ; only lower 4 bits relevant
         xlatb          ; replace al with hex digit in translation table
         int 10H        ; invoke bios print function
         loop .again

       mov ah, 0x0E     ; 
       mov al, 0x0D     ; a newline 
       int 10H
       mov al, 0x0A     ; formfeed
       int 10H

       pop cx           ; restore scan code
       jmp .loop

     .exit:
       ret

   start:

    mov ax, cs
    mov ds, ax
    mov es, ax

    call scancode.x

    here:    jmp here       ; loop forever or hlt ?
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature

  ,,,

  The code below is in GAS syntax.

  * print hex scan codes every time key is pressed, released
  ------------------
    BEGIN
    CLEAR
    /* TODO why CLI makes no difference? We are not using interrupts? */
    /*cli*/
    loop:
    /* Store the scancode to al. */
    in $0x60, %al
    cmp %al, %cl
    jz loop
    mov %al, %cl
    PRINT_HEX <%al>
    PRINT_NEWLINE
    jmp loop
  ,,,

INPUT FROM THE KEYBOARD ....
  
  * define ASCII code constant for the <Escape> key
  >> ESC equ 1bh   ; should we not check al==0 as well

  INT 16h with AH=00h or 10h will block waiting for a keypress (returns ASCII result in AL); 
  
  use AH=01h or 11h to query whether a keypress is available first if you want to avoid blocking (returns immediately with ZF clear if a key is available, or set if not). See e.g. here, or here (or Google "INT 16h" for more).

  * press any key to exit a loop
  -------------------------
   .again 
    mov ah, 0x01     ; x86 bios check if keypress available
    int 0x16      
    jz .again        ; loop forever if no keypress 

  ,,,

  In the code example below, space and backspace appear to work
  as expected on my machine, but [enter] returns the cursor to 
  the beginning of the line. 

  * read keys from the keyboard and print them to the screen
  ---------------------------------------------------------
  jmp start
  start:
  mov ax, 07C0h    ; Set data segment to where we're loaded
  mov ds, ax

    mov ah, 0     ; wait for keypress function
    int 16h
    mov ah, 0eH   ; echo the key pressed
    int 10H
    cmp al, 13    ; was the key press a 'enter' 
    jne start
    mov al, 10    ; if a 'enter' is pressed add a newline
    int 10h
    jmp start             ; keep looping! 

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,


  Below is a forth style "key" and "emit" functions in a 
  linked list. Parameters are passed on the stack

  * read a key press and place value on the stack 
  ------------
   jmp start

   ; get one keystroke from user and place on stack
   key:  
     dw 0         ; 1st word has a zero link 
     db 3, 'key'  ; forth-style function header 
   key.x:
     mov ah, 0    ; wait for keypress bios function
     int 16h
     pop bx       ; juggle function return pointer
     push ax      ; save keypress value on stack
     push bx      ; restore return pointer to stack
     ret
   emit:
      dw key        ; link to previous dictionary entry 
      db 4, 'emit'  
   emit.x:
      pop bx          ; juggle return address for call
      pop ax          ; character to print  (into al)
      push bx         ; restore return function call
      mov ah, 0eh     ; bios print character function
      int 10h
      ret

   start:
      mov ax, 07C0h      ; Set data segment to where we're loaded
      mov ds, ax     
      ;mov sp, ?         ; what about the stack pointer?
   here:
      call key.x
      call emit.x
      jmp here 
   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature
  ,,,

  * read keys and freeze if escape is pressed 
  ---------------------------------------------------------
  ESCP equ 1bh
  start:
    mov ah, 0
    int 16H
    cmp al, ESCP 
    je .done              ; If escape pressed, freeze! 
    mov ah, 0eH           ; print the character
    int 10H
    jmp start             ; keep looping! 
  .done:
    jmp $

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
 ,,,

  If a 'special' non-printing key is pressed, then register AL
  is zero and register AH contains the code for the key press

  * print '<' and '>' if arrow keys are pressed 
  ---------------------------------------------------------
  jmp start
  start:
  .again:
    mov ah, 0
    int 16H
    cmp al, 0 
    jne .printkey
    cmp ah, 75       ; left arrow
    je .leftarrow 
    cmp ah, 77       ; right arrow
    je .rightarrow 
    cmp ah, 82
    je .insertkey  ; insert key?
    cmp ah, 83
    je .deletekey  ; delete key?
    jmp .again       ; read more keys
 
  .leftarrow
    mov al, '<'
    jmp .printkey 
  .rightarrow
    mov al, '>'
    jmp .printkey 
  .insertkey
    mov al, 'I'
    jmp .printkey 
  .deletekey
    mov al, 'D'
    jmp .printkey 
  .printkey
    mov ah, 0eH     ; print the key pressed 
    int 10H
    jmp .again 
  .done:
    jmp $

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
 ,,,

  The following clears the screen but stops typing.

  * read keys and clear screen if escape is pressed 
  ---------------------------------------------------------
  ESC equ 1bh
  start:
    mov ah, 0     ; bios read char function
    int 16H       ; invoke bios
    cmp al, ESC   ; was key 'escape' ? 
    je .clear       ; If escape pressed, cls! 
    mov ah, 0eH     ; print the character
    int 10H
    jmp start       ; keep looping! 

  .clear:
    mov ah, 0
    mov al, 13h
    int 10H
    jmp start 

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
 ,,,


EDITING TEXT ....

  I think the idea is simple. One maintains a memory buffer of 
  text which is synchronised to what the user sees on the screen.
  When the user changes the text, the buffer gets updated. This
  buffer is then written periodically to storage.

  * a simple buffer editor with no save
  ------------
   BITS 16
   [ORG 0]
    cr equ 13   ;  carriage return
    lf equ 13   ;  carriage return

    jmp 07C0h:start     ; Goto segment 07C0

    ; a small buffer made blank 
    buffer times 128 db ' ' ; we only have 512 bytes here

    edit.doc db 'Allows the user to edit a buffer of text'
               dw $-edit.doc
    edit:
      dw 0       ; top of dictionary
      db 4, 'edit'
    edit.x:
      pop bx     ; return address
      pop di     ; address of the buffer and insert pointer
      push bx    ; restore return address

      mov ah, 0x0E ; bios teletype function 
      mov al, '*'  ; an edit prompt
      int 10H        ; invoke bios print function

      mov ah, 0      ; bios read key code function
      int 16H        ; invoke bios
      ;cmp al, 0      ; is extended char (AL != 0)  ?
      ;jne .again     ; wait for next key if not ->
      ;cmp ah, 77     ; right arrow
      cmp ax, 77      ; right arrow
      ;jne .again     ; wait for next key if not -> arrow key
      mov ah, 03h     ; get cursor position into dx  (int 10h, ah=03h)
      int 10h         ; invoke bios 
      mov ah, 02h     ; bios function: set cursor position specified in dx
      inc dl          ; increment column position
      int 10h      ; invoke bios

      ret

    start:
      mov ax, cs    ; initialize the data segment register DS
      mov ds, ax
      mov es, ax
      push buffer
      call edit.x

  here:    jmp here         ; loop forever 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

BYTECODE SYSTEM 512 BYTES

  Bytecode is an important goal for creating a forth which 
  can run on a virtual machine on any architecture. While execution
  speed will be slower than other types of systems, it is useful
  for rapidly porting to a new architecture, since all that 
  initially needs to be coded is the virtual machine

  The system below seems very powerful and very minimal. It is 
  a basic virtual machine implemented in as little as 100 bytes.
  Now to extend this, we need a way to compile code, and a virtual
  proceedure call for all high level functions. And we have to actually
  design the virtual machine.

  * a working forth style bytecode example within 512 bytes
  ----------------------
  [ORG 0]

   jmp 07C0h:start         ; Goto segment 07C0
 
   ; consider reverse counts. This allows us to decompile
   ; byte code by getting a list of pointers to ; code and then looking up name
   ; eg 
   ;   db 'minus', 5
   ;   dw exec    ; link to previous

   ; aliases for each bytecode, these aliases need to be in the
   ; same order as the pointer table below
   ; The nasm code below gives values of 1,2,3,4,5 etc to each 
   ; bytecode alias. A new opcode can be inserted without having
   ; to update all the following opcodes.

   DUP equ 1
   DROP equ DUP+1
   SWAP equ DROP+1
   STORE equ SWAP+1
   FETCHPLUS equ STORE+1
   COUNT equ FETCHPLUS     ; count is an alias for @+ 
   ISZERO equ FETCHPLUS+1
   LIT equ ISZERO+1
   LITW equ LIT+1
   EMIT equ LITW+1
   KEY equ EMIT+1
   PLUS equ KEY+1
   MINUS equ PLUS+1
   INCR equ MINUS+1
   DECR equ INCR+1 
   DIVMOD equ DECR+1
   FCALL equ DIVMOD+1 
   EXIT equ FCALL+1 
   JUMP equ EXIT+1 
   JUMPZ equ JUMP+1 
   ; JUMPNZ equ JUMPZ+1 

   ; a table of code pointers. The pointers have the same offset in
   ; table as value of the opcode
   table:
     dw 0, dup.x, drop.x, swap.x, store.x, fetchplus.x
     dw iszero.x, lit.x, litw.x, emit.x, key.x
     dw plus.x, minus.x, incr.x, decr.x
     dw divmod.x, fcall.x, exit.x
     dw jump.x, jumpz.x

   ; this is the function which executes the byte codes
   ; takes a pointer to the code. Jumps are relative to the first
   ; byte of the jump instruction

   ; another interesting point... we may need to use
   ; exec within the interpreter to run code. So that means
   ; si needs to be saved on the stack, no. 

   ; No, I think we should use fcall to run byte code. This
   ; is the correct way. 

   exec:
     dw 0 
     db 'exec', 4
   exec.x:
     ; save the return ip for 'exec' since the code
     ; below "call [table+bx]" changes the stack and 
     ; registers. Before this technique the same byte
     ; code would get executed over and over again because
     ; exec was not returning properly
     pop word [returnexec]     ; save return ip
     pop si      ; get pointer to code
   .nextopcode:
     xor ax, ax      ; set ax := 0
     lodsb           ; al := [si]++
     cmp al, 0       ; zero marks end of code
     je .exit

   .opcode:
     mov bx, ax      ; get opcode (1-6 etc) into bx
     shl bx, 1       ; double bx because its a word pointer 
     call [table+bx] ; use opcode as offset into code pointer table
     jmp .nextopcode
   .exit:
     push word [returnexec]     ; restore fn return ip
     ret

   returnexec dw 0


   plus.doc:
     ; db 'add the top 2 elements of the stack.'
     ; db ' ( n1 n2 -- n1+n2 ) '
     ; db ' This opcode is agnostic about whether the two 16 bit '
     ; db ' numbers are signed or unsigned. What should happen in '
     ; db ' the case of an overflow ? '
     ; db '  eg: LIT, 4, LIT, '0', PLUS, EMIT '
     ; db '   displays the digit "4" '
     ; dw $-plus.doc
   plus:
     dw exec.x
     db '+', 1
   plus.x:
     pop dx      ; juggle return pointer
     pop bx
     pop ax
     add ax, bx
     push ax
     push dx     ; restore return pointer
     ret

   minus.doc:
     ; db 'subtract the top element of stack from next top'
     ; db ' ( n1 n2 -- n1-n2 ) '
     ; dw $-minus.doc
   minus:
     dw plus.x
     db '-', 1
   minus.x:
     pop dx      ; juggle return pointer
     pop bx
     pop ax
     sub ax, bx
     push ax
     push dx     ; restore return pointer
     ret

   incr.doc:
     ; db 'Increment the top element of the data stack by one. '
     ; dw $-incr.doc
   incr:
     dw minus.x 
     db '1+', 2
   incr.x:
     pop dx      ; juggle return pointer
     pop ax
     inc ax
     push ax
     push dx     ; restore return pointer
     ret

   decr.doc:
     ; db 'Decrement the top element of the data stack by one. '
     ; dw $-decr.doc
   decr:
     dw incr.x 
     db '1-', 2
   decr.x:
     pop dx      ; juggle return pointer
     pop ax
     dec ax
     push ax
     push dx     ; restore return pointer
     ret

   divmod.doc:
     ; db '(n1 n2 - remainder quotient) '
     ; db ' divide n1 by n2 and provide remainder and quotient. '
     ; db ' n2 is the top item on the stack '
     ; dw $-divmod.doc
   divmod:
     dw decr.x 
     db '/mod', 4
   divmod.x:
     pop cx      ; juggle return pointer
     xor dx, dx  ; set dx := 0
     pop bx      ; divisor is top element on stack
     pop ax      ; dividend is next element
     div bx      ; does dx:ax / bx remainder->dx; quotient->ax
     push dx     ; put remainder on stack
     push ax     ; put quotient on top of stack
     push cx     ; restore return pointer
     ret

   fcall.doc:
     ; db 'Call a virtual proceedure on the bytecode stack machine'
     ; db 'The current code pointer (in the SI register) is saved - pushed '
     ; db 'onto the return stack and the address of the virtual proc '
     ; db 'to execute is loaded into SI. '
     ; dw $-fcall.doc
   fcall:
     dw divmod.x
     db 'fcall', 5
   fcall.x:
     ; probably have to save si, somewhere
     ; xor ax, ax      ; set ax := 0
     lodsw           ; ax := [si]++ get virtual call jump target into AX
     ; for nested calls try, but es needs to initialised to some space
     ; the technique below used es:di as a software stack pointer
     ; the value of si is saved in es:di and then di is incremented
     ; This is also the implementation of the return stack 
     ; mov [es:di], si
     ; add di, 2
     mov di, si      ; save si to di (but no nested calls)
     mov si, ax      ; adjust the si code pointer 
     ret

   ; need to implement a return stack... maybe in DI destination
   ; index register. This will allow nested calls to procedures.
   exit.doc:
     ; db 'exit a virtual procedure by restoring si code pointer'
     ; dw $-exit.doc
   exit:
     dw fcall.x 
     db 'exit', 4
   exit.x:
     ; for nested calls try getting si from the return stack
     ;sub di, 2
     ;mov si, [es:di]
     mov si, di      ; restore si from di (but no nested calls)
     ret

   dup.doc:
     ; db 'Duplicates the top item on the stack.'
     ; dw $-dup.doc
   dup: 
     dw exit.x       ; link to previous word 
     db 'dup', 3     ; strings are 'counted' 
   dup.x:
     pop dx      ; juggle fn return address
     pop ax      ; get param to duplicate
     push ax
     push ax
     push dx     ; restore fn return address
     ret
 
   drop.doc:
     ; db 'removes the top item on the stack.'
     ; dw $-drop.doc
   drop: 
     dw dup.x       ; link to previous word 
     db 'drop', 4     ; strings are 'counted' 
   drop.x:
     pop dx      ; juggle fn return address
     pop ax      ; remove top element of stack
     push dx     ; restore fn return address
     ret
 
   swap.doc:
     ; db 'swaps the top 2 items on the stack.'
     ; dw $-swap.doc
   swap: 
     dw drop.x         ; link to previous word 
     db 'swap', 4
   swap.x:
     pop dx      ; juggle fn return address
     pop ax      ; get top stack item 
     pop bx      ; get next stack item
     push ax     ; put them back on in reverse order
     push bx
     push dx     ; restore fn return address
     ret
 
   store.doc:
     ; db '( n adr -- ) store the byte value n at address adr.'
     ; db ' eg: 10 myvar ! '
     ; db '    puts the value 10 at the address specified by "myvar" '
     ; db ' The address is the top value on the stack. '
     ; dw $-store.doc
   store: 
     dw swap.x         ; link to previous word 
     db '!', 1
   store.x:
     pop dx         ; juggle fn return address
     pop bx         ; pointer to address
     pop ax         ; value to store at address 
     mov [bx], al   ; only the low value byte is stored
     push dx        ; restore fn return address
     ret
 
   fetchplus.doc:
     ; db '( adr -- adr+1 n ) '
     ; db ' Replace the top element of the stack with the value '
     ; db ' of the byte at the given memory address and increment the '
     ; db ' address . This is exactly the same as "count"'
     ; dw $-fetchplus.doc
   fetchplus: 
     dw store.x         ; link to previous word 
     db '@+', 2
   fetchplus.x:
     pop dx      ; juggle fn return address
     pop bx
     xor ax, ax  ; set ax := 0
     mov al, byte [bx]
     inc bx      ; increment address by 1
     push bx     ; save address on stack
     push ax     ; save value on top of stack
     push dx     ; restore fn return address
     ret
 
   iszero.doc:
     ; db ' ( flag -- not-flag ) '
   iszero:
     dw 0           ; link to previous word 
     db '0=', 2
   iszero.x:
     pop dx
     pop ax
     cmp ax, 0
     je .zero
     push 0        ; zero is false
     jmp .exit
   .zero:
     push -1       ; non zero is true in forth
   .exit:
     push dx
     ret

   lit.doc:
     ; db 'Pushes an 8 bit literal value onto the stack'
     ; db 'The literal value is encoded in the next byte '
     ; db 'after this instruction. This is similar to the '
     ; db 'forth "char" word. '
     ; dw $-lit.doc
   lit: 
     dw  0         ; link to previous word 
     db 'lit', 3     
   lit.x:
     pop dx         ; juggle fn return address
     xor ax, ax     ; set ax := 0
     lodsb          ; al := [si]++ get literal char into AL 
     ; cbw          ; convert signed byte al to signed word ax (neg offset)
     push ax        ; put literal value on stack
     push dx        ; restore fn return address
     ret
 
   litw.doc:
     ; db 'Pushes an 16 bit literal value onto the stack'
     ; dw $-litw.doc
   litw: 
     dw lit.x     ; link to previous word 
     db 'litw', 4     
   litw.x:
     pop dx         ; juggle fn return address
     lodsw          ; ax := [si]++ get literal char into AX
     push ax        ; put literal value on stack
     push dx        ; restore fn return address
     ret
 
   emit.doc:
     ; db 'removes and displays top item on stack as an ascii character.'
     ; db 'I suppose the character is in the low byte of the stack item...'
     ; dw $-emit.doc
   emit:
     dw litw.x 
     db 'emit', 4
   emit.x:
     pop bx         ; juggle return pointer
     pop ax         ; char in al
     push bx
     mov ah, 0x0E   ; bios teletype function 
     int 10h        ; x86 bios 
     ret

   key.doc:
     ; db 'Get one keystroke from user and place on stack'
     ; db 'The key is represented as an ascii code in the low byte '
     ; db 'of the stack item.' 
     ; dw $-key.doc
   key: 
     dw emit.x    ; link to prev
     db 'key', 3  ; reverse counted string
   key.x:
     mov ah, 0    ; wait for keypress bios function
     int 16h      ; ah := asci code and al := scan code
     pop bx       ; juggle function return pointer
     mov ah, 0    ; set ah = 0
     push ax      ; save asci code onto stack, high byte zero
     push bx      ; restore return pointer to stack
     ret

   jump.doc:
     ; db ' ( -- ) stack is unchanged.
     ; db ' Jumps to a relative virtual instruction.'
     ; db ' The relative jump is given in the next byte.'
     ; db ' eg: JUMP, -2, jumps back 2 instructions in the bytecode' 
     ; db ' eg: LIT, '*', EMIT, JUMP, -3, ' 
     ; db '  prints a never-ending list of asterixes '
     ; dw $-jump.doc
   jump: 
     dw key.x       ; link to prev
     db 'jump', 4   ; reverse count
   jump.x:
     ; jumps can be handled in the exec routine
     ; handle jumps by modifying virtual ip (in this case SI)
     xor ax, ax      ; set ax := 0
     lodsb           ; al := [si]++ get relative jump target into AL
     cbw             ; convert signed byte al to signed word ax (neg offset)
     sub si, 2       ; realign si to JUMP instruction, 
     add si, ax      ; adjust the si code pointer by jump offset
                     ; do we need to decrement si ?? yes, more logical
     ret

   jumpz.doc:
     ; db ' ( n -- )
     ; db 'jumps to a relative virtual instruction if top '
     ; db 'stack element is zero. The flag value is removed '
     ; db 'from the stack
     ; db ' The relative jump is given in the next byte.'
     ; db ' eg: JUMPZ, -2, jumps back 2 instructions in the bytecode' 
     ; db ' eg: KEY, DUP, EMIT, LIT, '0', MINUS, JUMPNZ, -6 ' 
     ; db '  allows the user to type until zero is pressed. '
     ; dw $-jump.doc
     ; handle jumps by modifying virtual ip (in this case SI)
   jumpz: 
     dw jump.x       ; link to prev
     db 'jumpz', 5  ; reverse count
   jumpz.x:
     pop dx          ; juggle return pointer
     xor ax, ax      ; set ax := 0
     lodsb           ; al := [si]++ get relative jump target into AL
     ; check stack for zero, if not continue with next instruction 
     pop bx          ; get top stack item into bx
     cmp bx, 0       ; if dx != 0 continue
     jne .exit
     cbw             ; convert signed byte al to signed word ax (neg offset)
     sub si, 2       ; realign si to JUMP instruction, 
     add si, ax      ; adjust the si code pointer by jump offset
   .exit:
     push dx         ; restore call return
     ret

   ; *******************************
   ; end of byte codes, 512 byte system
   ; *******************************

   udot.doc
     ; db ' ( n -- ) '
     ; db ' display top stack element as unsigned decimal number. '
   udot:
   udot.p:
     ; using 11 as a marker to know how many digits to print, but silly
     db LIT, 11, SWAP    ; 11 n
     db LIT, 10         ; 11 n 10
     db DIVMOD          ; 11 rem quotient
     db DUP, JUMPZ, 4
     db JUMP, -6
                       ; 11 rem rem rem ... 0
     db DROP           ; 11 rem rem ... 
     db LIT, '0', PLUS, EMIT   ; 11 rem ...  print remainder
     db DUP, LIT, 11, MINUS, JUMPZ, 4 
     db JUMP, -10
     db DROP
     db LIT, ' ', EMIT
     db EXIT 

   type.doc:
     ; db ' ( adr n -- ) '
     ; db ' Prints out n number of characters starting at address adr. '
   type:
     dw 0 
     db 'type', 4
   type.p:         
                   ; adr n
     db SWAP       ; n adr 
     db FETCHPLUS  ; n adr+1 a
     db EMIT       ; n adr+1
     db SWAP       ; adr+1 n
     db DECR       ; adr+1 n-1
     db DUP        ; adr+1 n-1 n-1
     db ISZERO
     db JUMPZ, -7 ; adr+1 n-1
     db EXIT

   accept.doc:
     ; db ' ( buffer -- )
     ; db ' receive a line of input from the terminal '
     ; db ' and store it as a counted string in the buffer. '
     ; db ' This should be rewritten to discard excess chars.'
     ; also need to handle backspaces to backtrack over
     ; buffer
   accept:
     dw type.p 
     db 'accept', 6
   accept.p:      
                       ; ( adr -- )
     db DUP, DUP       ; a a a
     db INCR, DUP      ; a a a+1 a+1 
     db KEY            ; a a a+1 a+1 'x' 
     db DUP            ; a a a+1 a+1 'x' 'x'
     db EMIT, DUP      ; a a a+1 a+1 'x' 'x'
     db LIT, 13, MINUS, JUMPZ, 6  ;  a a a+1 a+1 'x'
     ; detect backspace 
     ; db DUP, LIT, 8, MINUS, JUMPNZ, +??
     ; handle backspace
     ; LIT, space, EMIT, LIT, backspace, EMIT 
     ; DROP, DROP, DECR   
     ; JUMP, -??   ; get next key
     db SWAP           ; a a a+1 'x' a+1
     db STORE          ; a a a+1  /put char in buffer
     db JUMP, -13      ; not newline so get another char
     db LIT, 10, EMIT     ; print newline if enter pressed
                          ; a a a+n a+n 'x'
     db DROP, DROP, DECR  ; a a a+n-1 
     db SWAP, MINUS       ; a n-1
     db SWAP, STORE       ; [a] := n-1
     db EXIT    ; all virtual procedures end with 'exit'

   pad: db 12, 'oon leaf'
   buff: db 0, ' '


   wow db KEY, EMIT, EXIT
   ; testing 512 one sector byte code
   code:
     ; displays asci values of keys
     db KEY, DUP
     db FCALL
     dw udot.p        ; a+1 
     db EMIT, LIT, ' ', EMIT 
     db JUMP, -9
     db 0

   start:

      ; mov ax, cs      ; cs is already correct (?!) 
      mov ax, 07C0h    ; Set data segment to where we're loaded
      mov ds, ax       ; data segment  
      mov es, ax       ; es needed for stosb 

      add ax, 288      ; (4096 + 512) / 16 bytes per paragraph
      mov ss, ax       ; a 4K stack here
      mov sp, 4096     ; set up the stack pointer

      push code 
      call exec.x

   here:  jmp here 

   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature
  ,,,


BYTECODE SYSTEM MULTIPLE SECTORS

  * bytecode system with multiple sectors
  ------------------------------
  ; history
  ;  10 june 2017 
  ;     made a return stack with es:di and made fcall.x and exit.x
  ;     use the return stack, apparently successfully which allows
  ;     nested procedures.

  BITS 16
  [ORG 0]

   jmp 07C0h:load    ; Goto segment 07C0
     drive db 0      ; a variable to hold boot drive number
   load:
     mov ax, cs     ; the code segment is already correct (?!)
     mov ds, ax     ; set up data and extended segments
     mov es, ax
     mov [drive], dl ; save the boot drive number
     mov ax, 07C0h   ; Set up 4K stack space after this bootloader
     add ax, 288     ; (4096 + 512) / 16 bytes per paragraph
     mov ss, ax      ; with a 4K gap between stack and code
     mov sp, 4096

      ; save the DL register or else dont modify it
      ; it contains the number of the boot medium (hard disk,
      ; usb memory stick etc)
      ; The 'floppy' Drive is NOT necesarily 0!!!

    reset:            ; Reset the virtual floppy drive (usb)
      mov ax, 0       ; 
      mov dl, [drive] ; the boot drive number (eg for usb 128)
      int 13h         ;
      jc reset        ; ERROR => reset again
    read:
      mov ax, 1000h       ; ES:BX = 1000:0000
      mov es, ax          ; es:bx determines where data loaded to
      mov bx, 0           ;
      mov ah, 2           ; Load disk data to ES:BX
      mov al, 4           ; Load 4 sectors ie 512 bytes * 5 == 2K  
      ; try mov cx, 0x0002 ; cylinder 0, sector 2
      mov ch, 0           ; Cylinder=0
      mov cl, 2           ; Sector=2 (sector 1 is the boot sector)
      mov dh, 0           ; Head=0
      mov dl, [drive]     ; 
      int 13h             ; Read!
      jc read             ; ERROR => Try again

    jmp 1000h:0000      ; Jump to the loaded code 

    times 510-($-$$) db 0   ; pad out the boot sector (512 bytes)
    dw 0AA55h               ; end with standard boot signature

 ; this below is the magic line to make the new memory offsets
 ; work. Or compile the 2 files separately
 ; https://forum.nasm.us/index.php?topic=2160.0 

    section stage2 vstart=0

    jmp start

   ; consider reverse counts. This allows us to decompile
   ; byte code by getting a list of pointers to 
   ; code and then looking up name
   ; eg 
   ;   db 'minus', 5
   ;   dw exec    ; link to previous
   ; Another refinement is to hold the top element of the 
   ; stack in ax, which simplifies a lot of stack manipulation.
   ; eg 1+ becomes "inc ax" etc

   ; aliases for each bytecode, these aliases need to be in the
   ; same order as the pointer table below
   ; The nasm code below gives values of 1,2,3,4,5 etc to each 
   ; bytecode alias. A new opcode can be inserted without having
   ; to update all the following opcodes.

   DUP equ 1
   DROP equ DUP+1
   SWAP equ DROP+1
   OVER equ SWAP+1
   RON equ OVER+1
   ROFF equ RON+1
   FETCH equ ROFF+1
   FETCHPLUS equ FETCH+1
   CSTORE equ FETCHPLUS+1
   CSTOREPLUS equ CSTORE+1
   CFETCH equ CSTOREPLUS+1
   CFETCHPLUS equ CFETCH+1
   COUNT equ CFETCHPLUS    ; count is an alias for c@+
   EQUALS equ CFETCHPLUS+1
   NOTEQUALS equ EQUALS+1
   LIT equ NOTEQUALS+1
   LITW equ LIT+1
   EMIT equ LITW+1
   KEY equ EMIT+1
   PLUS equ KEY+1
   MINUS equ PLUS+1
   INCR equ MINUS+1
   DECR equ INCR+1 
   DIVMOD equ DECR+1
   FCALL equ DIVMOD+1 
   EXIT equ FCALL+1 
   JUMP equ EXIT+1 
   JUMPZ equ JUMP+1 
   JUMPF equ JUMPZ     ; jumpf (false) is an aliase for jumpz 
   JUMPNZ equ JUMPZ+1 
   JUMPT equ JUMPNZ    ; jump-true alias for jump-not-zero
   RLOOP equ JUMPNZ+1
   DIVTWO equ RLOOP+1
   NOCODE equ DIVTWO+1  ; nocode is an end marker

   ; a table of code pointers. The pointers have the same offset in
   ; table as value of the opcode
   op.table:
     dw 0, dup.x, drop.x, swap.x, over.x
     dw ron.x, roff.x, fetch.x, fetchplus.x
     dw cstore.x, cstoreplus.x
     dw cfetch.x, cfetchplus.x
     dw equals.x, notequals.x, lit.x, litw.x, emit.x, key.x
     dw plus.x, minus.x, incr.x, decr.x
     dw divmod.x, fcall.x, exit.x
     dw jump.x, jumpz.x, jumpnz.x, rloop.x
     dw divtwo.x, 0


   ; this is the function which executes the byte codes
   ; takes a pointer to the code. Jumps are relative to the first
   ; byte of the jump instruction
   exec:
     dw 0 
     db 'exec', 4
   exec.x:
     ; save the return ip for 'exec' since the code
     ; below call [op.table+bx] changes the stack and 
     ; registers. Before this technique the same byte
     ; code would get executed over and over again because
     ; exec was not returning properly
     pop word [returnexec]     ; save return ip
     pop si      ; get pointer to code
   .nextopcode:
     xor ax, ax      ; set ax := 0
     lodsb           ; al := [si]++
     cmp al, 0       ; zero marks end of code
     je .exit

   .opcode:
     mov bx, ax      ; get opcode (1-6 etc) into bx
     shl bx, 1       ; double bx because its a word pointer 
     call [op.table+bx] ; use opcode as offset into code pointer table
     jmp .nextopcode
   .exit:
     push word [returnexec]     ; restore fn return ip
     ret

   returnexec dw 0


   plus.doc:
     ; db 'add the top 2 elements of the stack.'
     ; db ' ( n1 n2 -- n1+n2 ) '
     ; db ' This opcode is agnostic about whether the two 16 bit '
     ; db ' numbers are signed or unsigned. What should happen in '
     ; db ' the case of an overflow ? '
     ; db '  eg: LIT, 4, LIT, '0', PLUS, EMIT '
     ; db '   displays the digit "4" '
     ; dw $-plus.doc
   plus:
     dw exec.x
     db '+', 1
   plus.x:
     pop dx      ; juggle return pointer
     pop bx
     pop ax
     add ax, bx
     push ax
     push dx     ; restore return pointer
     ret

   minus.doc:
     ; db 'subtract the top element of stack from next top'
     ; db ' ( n1 n2 -- n1-n2 ) '
     ; dw $-minus.doc
   minus:
     dw plus.x
     db '-', 1
   minus.x:
     pop dx      ; juggle return pointer
     pop bx
     pop ax
     sub ax, bx
     push ax
     push dx     ; restore return pointer
     ret

   incr.doc:
     ; db 'Increment the top element of the data stack by one. '
     ; dw $-incr.doc
   incr:
     dw minus.x 
     db '1+', 2
   incr.x:
     pop dx      ; juggle return pointer
     pop ax
     inc ax
     push ax
     push dx     ; restore return pointer
     ret

   decr.doc:
     ; db 'Decrement top element of the data stack by one. '
     ; dw $-decr.doc
   decr:
     dw incr.x 
     db '1-', 2
   decr.x:
     pop dx      ; juggle return pointer
     pop ax
     dec ax
     push ax
     push dx     ; restore return pointer
     ret

   divtwo.doc:
     ; db '(n1  - n1/2) '
     ; db ' divide n1 by 2 '
     ; dw $-divtwo.doc
   divtwo:
     dw decr.x 
     db '/2', 2
   divtwo.x:
     pop dx      ; juggle return pointer
     pop ax      ; dividend is next element
     shr ax, 1   ; do ax := (ax+1)/2
     push ax
     push dx     ; restore return pointer
     ret

   divmod.doc:
     ; db '(n1 n2 - remainder quotient) '
     ; db ' divide n1 by n2 and provide remainder and quotient. '
     ; db ' n2 is the top item on the stack '
     ; dw $-divmod.doc
   divmod:
     dw divtwo.x 
     db '/mod', 4
   divmod.x:
     pop cx      ; juggle return pointer
     xor dx, dx  ; set dx := 0
     pop bx      ; divisor is top element on stack
     pop ax      ; dividend is next element
     div bx      ; does dx:ax / bx remainder->dx; quotient->ax
     push dx     ; put remainder on stack
     push ax     ; put quotient on top of stack
     push cx     ; restore return pointer
     ret

   fcall.doc:
     ; db 'Call a virtual proceedure on the bytecode stack machine'
     ; db 'The current code pointer (in the SI register) is saved - pushed '
     ; db 'onto the return stack and the address of the virtual proc '
     ; db 'to execute is loaded into SI. '
     ; dw $-fcall.doc
   fcall:
     dw divmod.x
     db 'fcall', 5
   fcall.x:
     ; probably have to save si, somewhere
     ; xor ax, ax      ; set ax := 0
     lodsw           ; ax := [si]++ get virtual call jump target into AX
     ; for nested calls try, but es needs to initialised to some space
     ; the technique below used es:di as a software stack pointer
     ; the value of si is saved in es:di and then di is incremented
     ; This is also the implementation of the return stack 
     mov [es:di], si
     add di, 2
     ; mov di, si      ; save si to di (but no nested calls)
     mov si, ax      ; adjust the si code pointer 
     ret

   ; need to implement a return stack... maybe in DI destination
   ; index register. This will allow nested calls to procedures.
   exit.doc:
     ; db 'exit a virtual procedure by restoring si code pointer'
     ; dw $-exit.doc
   exit:
     dw fcall.x 
     db 'exit', 4
   exit.x:
     ; for nested calls try getting si from the return stack
     sub di, 2
     mov si, [es:di]
     ; mov si, di      ; restore si from di (but no nested calls)
     ret

   dup.doc:
     ; db 'Duplicates the top item on the stack.'
     ; dw $-dup.doc
   dup: 
     dw exit.x       ; link to previous word 
     db 'dup', 3     ; strings are 'counted' 
   dup.x:
     pop dx      ; juggle fn return address
     pop ax      ; get param to duplicate
     push ax
     push ax
     push dx     ; restore fn return address
     ret
 
   drop.doc:
     ; db 'removes the top item on the stack.'
     ; dw $-drop.doc
   drop: 
     dw dup.x       ; link to previous word 
     db 'drop', 4     ; strings are 'counted' 
   drop.x:
     pop dx      ; juggle fn return address
     pop ax      ; remove top element of stack
     push dx     ; restore fn return address
     ret
 
   swap.doc:
     ; db 'swaps the top 2 items on the stack.'
     ; dw $-swap.doc
   swap: 
     dw drop.x         ; link to previous word 
     db 'swap', 4
   swap.x:
     pop dx      ; juggle fn return address
     pop ax      ; get top stack item 
     pop bx      ; get next stack item
     push ax     ; put them back on in reverse order
     push bx
     push dx     ; restore fn return address
     ret
 
   over.doc:
     ; db ' ( n1 n2 -- n1 n2 n1 ) '
     ; db ' Puts a copy of the 2nd stack item on top of the stack. '
     ; dw $-swap.doc
   over: 
     dw swap.x         ; link to previous word 
     db 'over', 4
   over.x:
     pop dx      ; juggle fn return address
     pop ax      ; get top stack item 
     pop bx      ; get next stack item
     push bx     ; 
     push ax     ; 
     push bx     ; add copy of 2nd item on top of stack
     push dx     ; restore fn return address
     ret
 
   ron.doc:
     ; db '( S: n -- )( R: -- n ) '
     ; db ' put the top item of the data stack onto the return stack.'
     ; dw $-ron.doc
   ron: 
     dw over.x      
     db '>r', 2
   ron.x:
     pop dx         ; juggle fn return address
     pop ax         ; value to store at address 
     stosw          ; [es:di] := ax, di+2
     push dx        ; restore fn return address
     ret
 
   roff.doc:
     ; db '( S: -- n)( R: n -- ) '
     ; db ' put the top item of the return stack onto the data stack.'
     ; dw $-roff.doc
   roff: 
     dw ron.x      
     db 'r>', 2
   roff.x:
     pop dx          ; juggle fn return address
     sub di, 2       ; 
     mov ax, [es:di] ; get top item off return stack
     push ax
     push dx         ; restore fn return address
     ret
 
   fetch.doc:
     ; db '( adr -- n ) '
     ; db ' Replace the top element of the stack with the '
     ; db ' value of the 16bites at the given memory address '
     ; dw $-fetch.doc
   fetch: 
     dw roff.x         ; link to previous word 
     db '@', 1
   fetch.x:
     pop dx      ; juggle fn return address
     pop bx
     mov ax, word [bx]
     push ax     ; save value on top of stack
     push dx     ; restore fn return address
     ret
 
   fetchplus.doc:
     ; db '( adr -- adr+2 n ) '
     ; db ' Replace the top element of the stack with the '
     ; db ' value of the 16bites at the given memory address '
     ; db ' and increment the address by 2 bytes. '
     ; dw $-fetchplus.doc
   fetchplus: 
     dw fetch.x         ; link to previous word 
     db '@+', 2
   fetchplus.x:
     pop dx      ; juggle fn return address
     pop bx
     mov ax, word [bx]
     add bx, 2   ; increment address by 1 word (2 bytes)
     push bx     ; save address on stack
     push ax     ; save value on top of stack
     push dx     ; restore fn return address
     ret
 
   cstore.doc:
     ; db '( n adr -- ) store the byte value n at address adr.'
     ; db ' eg: 10 myvar ! '
     ; db '    puts the value 10 at the address specified by "myvar" '
     ; db ' The address is the top value on the stack. '
     ; dw $-cstore.doc
   cstore: 
     dw fetchplus.x         ; link to previous word 
     db 'c!', 2
   cstore.x:
     pop dx         ; juggle fn return address
     pop bx         ; pointer to address
     pop ax         ; value to store at address 
     mov [bx], al   ; only the low value byte is stored
     push dx        ; restore fn return address
     ret
 
   cstoreplus.doc:
     ; db '( n adr -- adr+1 ) store the byte value n at address adr.'
     ; db ' And increment the address '
     ; dw $-cstoreplus.doc
   cstoreplus: 
     dw cstore.x         ; link to previous word 
     db 'c!+', 3
   cstoreplus.x:
     pop dx         ; juggle fn return address
     pop bx         ; pointer to address
     pop ax         ; value to store at address 
     mov [bx], al   ; only the low value byte is stored
     inc bx         ; advance address and put on stack
     push bx       
     push dx        ; restore fn return address
     ret
 
   cfetch.doc:
     ; db '( adr -- n ) Replace the top element of the stack with the value '
     ; db ' of the byte at the given memory address.'
     ; db ' eg: myvar @ . '
     ; db '  displays the value at the address given by "myvar" '
     ; dw $-cfetch.doc
   cfetch: 
     dw cstoreplus.x         ; link to previous word 
     db 'c@', 2
   cfetch.x:
     pop dx      ; juggle fn return address
     pop bx
     xor ax, ax  ; set ax := 0
     mov al, byte [bx]
     push ax
     push dx     ; restore fn return address
     ret
 
   cfetchplus.doc:
     ; db '( adr -- adr+1 n ) '
     ; db ' Replace the top element of the stack with the value '
     ; db ' of the byte at the given memory address and increment the '
     ; db ' address . This is exactly the same as "count"'
     ; dw $-fetchplus.doc
   cfetchplus: 
     dw cfetch.x         ; link to previous word 
     db 'c@+', 3
   cfetchplus.x:
     pop dx      ; juggle fn return address
     pop bx
     xor ax, ax  ; set ax := 0
     mov al, byte [bx]
     inc bx      ; increment address by 1
     push bx     ; save address on stack
     push ax     ; save value on top of stack
     push dx     ; restore fn return address
     ret
 
   equals.doc:
     ; db ' ( n1 n2 -- flag ) '
     ; db 'Puts -1 (true) on the stack if n1==n2 '
     ; db 'otherwise puts zero (false) on the stack. '
     ; dw $-equals.doc
   equals: 
     dw cfetchplus.x     ; link to previous word 
     db '=', 1     
   equals.x:
     pop dx         ; juggle fn return address
     pop ax         ; top stack item 
     pop bx         ; 2nd stack item
     cmp ax, bx
     je .true
   .false:
     push 0
     jmp .exit
   .true:
     push -1
   .exit:
     push dx        ; restore fn return address
     ret
 
   notequals.doc:
     ; db ' ( n1 n2 -- flag ) '
     ; db 'Puts 0 (false) on the stack if n1==n2 '
     ; db 'otherwise puts -1 (true) on the data stack'
     ; dw $-notequals.doc
   notequals: 
     dw equals.x     ; link to previous word 
     db '<>', 2     
   notequals.x:
     pop dx         ; juggle fn return address
     pop ax         ; top stack item 
     pop bx         ; 2nd stack item
     cmp ax, bx
     jne .true
   .false:
     push 0
     jmp .exit
   .true:
     push -1
   .exit:
     push dx        ; restore fn return address
     ret
 
   ; literal should push a 2 byte value onto stack
   ; maybe call this "char"
   lit.doc:
     ; db 'Pushes an 8 bit literal value onto the stack'
     ; dw $-lit.doc
   lit: 
     dw notequals.x     ; link to previous word 
     db 'lit', 3     
   lit.x:
     pop dx         ; juggle fn return address
     xor ax, ax     ; set ax := 0
     lodsb          ; al := [si]++ get literal char into AL 
     cbw            ; convert signed byte to signed word ax
     push ax        ; put literal value on stack
     push dx        ; restore fn return address
     ret
 
   litw.doc:
     ; db 'Pushes an 16 bit literal value onto the stack'
     ; dw $-litw.doc
   litw: 
     dw lit.x     ; link to previous word 
     db 'litw', 4     
   litw.x:
     pop dx         ; juggle fn return address
     lodsw          ; ax := [si]++ get literal char into AX
     push ax        ; put literal value on stack
     push dx        ; restore fn return address
     ret
 
   emit.doc:
     ; db 'removes and displays top item on stack as an ascii character.'
     ; db 'I suppose the character is in the low byte of the stack item...'
     ; dw $-emit.doc
   emit:
     dw litw.x 
     db 'emit', 4
   emit.x:
     pop bx         ; juggle return pointer
     pop ax         ; char in al
     push bx
     mov ah, 0x0E   ; bios teletype function 
     int 10h        ; x86 bios 
     ret

   key.doc:
     ; db 'Get one keystroke from user and place on stack'
     ; db 'The key is represented as an ascii code in the low byte '
     ; db 'of the stack item.' 
     ; dw $-key.doc
   key: 
     dw emit.x    ; link to prev
     db 'key', 3  ; reverse counted string
   key.x:
     mov ah, 0    ; wait for keypress bios function
     int 16h      ; ah := asci code and al := scan code
     pop bx       ; juggle function return pointer
     mov ah, 0    ; set ah = 0
     push ax      ; save asci code onto stack, high byte zero
     push bx      ; restore return pointer to stack
     ret

   jump.doc:
     ; db 'jumps to a relative virtual instruction.'
     ; db ' The relative jump is given in the next byte.'
     ; db ' eg: JUMP, -2, jumps back 2 instructions in the bytecode' 
     ; db ' eg: LIT, '*', EMIT, JUMP, -3, ' 
     ; db '  prints a never-ending list of asterixes '
     ; dw $-jump.doc
   jump: 
     dw key.x       ; link to prev
     db 'jump', 4   ; reverse count
   jump.x:
     ; jumps can be handled in the exec routine
     ; handle jumps by modifying virtual ip (in this case SI)
     xor ax, ax      ; set ax := 0
     lodsb           ; al := [si]++ get relative jump target into AL
     cbw             ; convert signed byte al to signed word ax (neg offset)
     sub si, 2       ; realign si to JUMP instruction, 
     add si, ax      ; adjust the si code pointer by jump offset
                     ; do we need to decrement si ?? yes, more logical
     ret

   jumpz.doc:
     ; db ' ( n -- )
     ; db 'jumps to a relative virtual instruction if top '
     ; db 'stack element is zero. The flag value is removed '
     ; db 'from the stack
     ; db ' The relative jump is given in the next byte.'
     ; db ' eg: JUMPZ, -2, jumps back 2 instructions in the bytecode' 
     ; db ' eg: KEY, DUP, EMIT, LIT, '0', MINUS, JUMPNZ, -6 ' 
     ; db '  allows the user to type until zero is pressed. '
     ; dw $-jump.doc
     ; handle jumps by modifying virtual ip (in this case SI)
   jumpz: 
     dw jump.x       ; link to prev
     db 'jumpz', 5  ; reverse count
   jumpz.x:
     pop dx          ; juggle return pointer
     xor ax, ax      ; set ax := 0
     lodsb           ; al := [si]++ get relative jump target into AL
     ; check stack for zero, if not continue with next instruction 
     pop bx          ; get top stack item into bx
     cmp bx, 0       ; if dx != 0 continue
     jne .exit
     cbw             ; convert signed byte al to signed word ax (neg offset)
     sub si, 2       ; realign si to JUMP instruction, 
     add si, ax      ; adjust the si code pointer by jump offset
   .exit:
     push dx         ; restore call return
     ret

   ; should jumps take top stack element off ? yes
   jumpnz.doc:
     ; db 'jumps to a relative virtual instruction if top stack element '
     ; db ' is not zero.
     ; db ' The relative jump is given in the next byte.'
     ; db ' eg: JUMPNZ, -2, jumps back 2 instructions in the bytecode' 
     ; db ' eg: KEY, DUP, EMIT, LIT, 'q', MINUS, JUMPNZ, -6 ' 
     ; db '  allows the user to type until "q" is pressed. '
     ; dw $-jumpnz.doc
   jumpnz: 
     dw jumpz.x       ; link to prev
     db 'jumpnz', 6  ; reverse count
   jumpnz.x:

     ; handle jumps by modifying virtual ip (in this case SI)
     pop dx          ; juggle return pointer
     xor ax, ax      ; set ax := 0
     lodsb           ; al := [si]++ get relative jump target into AL
     ; check stack for zero, if so continue with next 
     ; instruction (dont jump)
     pop bx          ; get top stack item into bx
     cmp bx, 0       ; if bx != 0 continue
     je .exit        ; the only difference with jumpz !
     cbw             ; convert signed byte al to signed word ax (neg offset)
     sub si, 2       ; realign si to JUMP instruction, 
     add si, ax      ; adjust the si code pointer by jump offset
   .exit:
     push dx         ; restore call return
     ret

   rloop.doc:
     ; db ' ( R: n -- n-1 ) '
     ; db ' Decrements loop counter on return stack and jumps to '
     ; db ' target if counter > 0 '
     ; db ' like the x86 loop instruction this is a pre-decrement '
     ; db ' so a loop counter of 2 will loop twice. The disadvantage '
     ; db ' is that a loop counter of 0 will loop 2^16 times. '
     ; dw $-rloop.doc
   rloop: 
     dw jumpnz.x       ; link to prev
     db 'loop', 4      ; reverse count
   rloop.x:
     ; handle loops by modifying virtual ip (in this case SI)
     pop dx          ; juggle return pointer
     xor ax, ax      ; set ax := 0
     lodsb           ; al := [si]++ get relative loop target into AL
     ; check return stack for zero, if so continue with next 
     ; instruction (dont jump/loop)
     mov bx, [es:di-2] ; get top return stack item into bx
     dec bx          ; decrement the loop counter on the return stack
     cmp bx, 0       ; if bx != 0 continue
     mov [es:di-2], bx ; update the counter
     je .exit        ; the only difference with jumpz !
     cbw             ; convert signed byte al to signed word ax (neg offset)
     sub si, 2       ; realign si to JUMP instruction, 
     add si, ax      ; adjust the si code pointer by jump offset
   .exit:
     push dx         ; restore call return
     ret


   ; *******************************
   ; end of byte codes
   ; *******************************

   ntimes.doc:
     ; db ' ( n -- )
     ; db ' print a star n times. '
   ntimes:
     dw rloop.x
     db 'ntimes', 6
   ntimes.p:
     db LIT, '*', EMIT, DECR, DUP, JUMPNZ, -5
     db FCALL
     dw nl.p
     db FCALL
     dw nl.p
     db EXIT

   dotstack.doc:
     ; db ' ( -- )
     ; db ' display the items on the data stack without '
     ; db ' altering it. The top (or most recent) item '
     ; db ' is printed rightmost '
   dotstack:
     dw rloop.x
     db '.S', 2
   dotstack.p:
     db EXIT

   opcode.doc:
     ; db ' ( adr -- n )
     ; db ' Given the address of an execution token '
     ; db ' or procedure on the stack provides the numeric '
     ; db ' opcode for that procedure or else 0 (-1?) for '
     ; db ' an address which does not correspond to a '
     ; db ' bytecode.'
     ; db ' This is used to compile text to bytecode '
     ; db ' if not an opcode, then compile FCALL etc '
   opcode:
     dw dotstack.p
     db 'opcode', 6
   opcode.p:
     ; to do handle not found case
                     ; adr
     db RON          ; () R: adr
     db LITW         ; 
     dw op.table     ; table.adr
     db DUP          ; t.adr t.adr 
     db FETCH, DUP   ; t.adr ex.adr ex.adr
     db ROFF, DUP, RON  ; t.adr ex.adr ex.adr adr  
     db EQUALS       ; t.adr ex.adr flag
     db JUMPT, 7     ; t.adr ex.adr
     ; eg LIT, -1, EQUALS   ; -1 end marker in table
     ; JUMPT, not.found (push 0 or -1)
     db DROP     ; remove later
     db INCR, INCR   ; t.adr+2
     db JUMP, -12    ; t.adr+2
     db DROP         ; t.adr
     db LITW
     dw op.table     ; t.adr op.table 
     db MINUS        ; t.adr-op.table
     db DIVTWO       ; n/2
     db FCALL
     dw udot.p
     db EXIT

   dotcode.doc:
     ; db ' ( opcode - ) '
     ; db ' given a valid opcode on the stack, print '
     ; db ' the textual version of the opcode. '
     ; dw $-dotcode.doc
   dotcode:
     dw opcode.p 
     db '.code', 5
   dotcode.p:
                     ; op
     db DUP, PLUS    ; op*2 
     db LITW
     dw op.table     ; op*2 op.table
     db PLUS         ; op*2+op.table
     db FETCH        ; [op*2+op.table] / get execution adr
     db DECR, DUP, CFETCH ; adr n  / get the count
     db DUP          ; adr n n
     db RON, MINUS   ; adr-n
     db ROFF         ; adr-n n
     db FCALL
     dw type.p
     db LIT, ' ', EMIT
     db EXIT

   listcodes.doc:
     ; db ' ( - ) '
     ; db ' list all valid opcodes for the bytecode machine'
     ; dw $-listcodes.doc
   listcodes:
     dw dotcode.p 
     db 'listcodes', 9
   listcodes.p:
     db LIT, NOCODE, DECR, RON   ; set up loop counter
     db ROFF, DUP, RON  ; I
     db FCALL
     dw dotcode.p
     db RLOOP, -6
     db ROFF, DROP      ; clear counter from rstack
     db EXIT

   name.doc:
     ; db ' ( xt - ) '
     ; db ' given a valid execution token for a bytecode or'
     ; db ' procedure on the stack print the name '
     ; dw $-name.doc
   name:
     dw listcodes.p 
     db 'name', 4
   name.p:
                     ; adr 
     db DECR, DUP, CFETCH ; adr n  / get the count
     db DUP          ; adr n n
     db RON, MINUS   ; adr-n
     db ROFF         ; adr-n n
     db FCALL
     dw type.p
     db LIT, ' ', EMIT
     db EXIT

   asci.doc:
     ; db 'shows the asci chars'
     ; dw $-asci.doc
   asci:
     dw name.p 
     db 'asci', 4
   asci.p:
     ; in ascending order
     db LIT, 1, INCR, DUP, EMIT
     db DUP, LIT, 252, MINUS, JUMPNZ, -7, EXIT
     ; in descending order
     ; db LIT, 255, DECR, DUP, EMIT
     ; db DUP, JUMPNZ, -4, EXIT

   keycode.doc:
     ; db 'shows the asci chars values when a key is pressed'
     ; dw $-keycode.doc
   keycode:
     dw asci.p 
     db 'keycode', 7
   keycode.p:
     db LIT, 20
     db FCALL
     dw ntimes.p
     db KEY, DUP
     db FCALL
     dw udot.p        ; a+1 
     ; check for 27 escape to quit
     db EMIT, LIT, 13, EMIT, LIT, 10, EMIT
     db JUMP, -12
     db EXIT


   nl.doc:
     ; db 'send a newline to the terminal. '
     ; dw $-nl.doc
   nl:
     dw keycode.p
     db 'nl', 2
   nl.p:
     db LIT, 10, EMIT, LIT, 13, EMIT, EXIT

   udot.doc:
     ; db ' ( n -- ) '
     ; db ' display top stack element as unsigned decimal number. '
     ; dw $-udot.doc
   udot:
     dw nl.p 
     db 'u.', 2
   udot.p:
     ; using 11 as a marker to know how many digits to print, but silly
     db LIT, 11, SWAP    ; 11 n
     db LIT, 10         ; 11 n 10
     db DIVMOD          ; 11 rem quotient
     db DUP, JUMPNZ, -4
                       ; 11 rem rem rem ... 0
     db DROP           ; 11 rem rem ... 
     db LIT, '0', PLUS, EMIT   ; 11 rem ...  print remainder
     db DUP, LIT, 11, EQUALS, JUMPF, -8
     db DROP
     db LIT, ' ', EMIT
     db EXIT 

   tonumber.doc:
     ; db ' ( adr -- n ) '
     ; db ' Given the address of a counted string on '
     ; db ' the stack, attempt to convert the string to '
     ; db ' a number, adding and multiplying by successive '
     ; db ' digits. '
     ; dw $-tonumber.doc
   tonumber:
     dw udot.p 
     db '>number', 7
   tonumber.p:         
     db EXIT 

   pword.doc:
     ; db ' ( a1 n -- a2 ) '
     ; db ' Parse one word from text found at address a1 '
     ; db ' and leave a counted string in buffer at a2 '
     ; db ' n is the maximum number of chars to parse '
     ; dw $-pword.doc
     ; cant call this "word" because nasm doesnt like that
   pword:
     dw tonumber.p 
     db 'word', 4
   pword.p:         
                     ; adr n
     ; a nearly working parse word proceedure...
     db RON        ; put length on return stack for looping
     db LITW
     dw nextw.n    ; from to 
     db INCR       ; from to+1  /skip count byte in to buffer
     db SWAP       ; t+1 f
     db CFETCHPLUS ; t+1 f+1 n
     db DUP        ; t+1 f+1 n n
     db LIT, ' ', EQUALS    ; t+1 f+1 n flag
     db JUMPF, 4  ; t+1 f+1 n
     db RLOOP, -9  
     ; check for zero loop counter here
     ; change these emits to stores ...!
     ; something like, need to write storeplus
     ; RON, SWAP, ROFF, SWAP, CSTOREPLUS, SWAP
     db EMIT       ; t+1 f+1
     db ROFF, DECR, RON   ; dec counter
     db CFETCHPLUS ; t+1 f+2 n
     db DUP, LIT, ' ', EQUALS    ; t+1 f+1 n flag
     db JUMPT, 5 
     db EMIT
     db RLOOP, -8  
     ; push loop count on stack
     db ROFF, DROP
     db 0
     db EXIT 

   ; just testing if pword is working
   test.pword.p:
     db EXIT

   toin.doc:
     ; db ' ( -- adr n ) '
     ; db ' put on stack current parse position in input stream'
     ; db ' and number of characters remaining in stream. '
     ; db ' This is used with pword.x to parse each word from the '
     ; db ' stream.'
     ; dw $-toin.doc
   toin:
     dw pword.p 
     db '>in', 3
   toin.p:         
     db EXIT 
   toin.n dw 0
      
   find.doc:
     ; db ' ( a1 -- xt ) '
     ; db ' given a counted string, return execution '
     ; db ' token for word '
     ; dw $-find.doc
   find:
     dw pword.p 
     db 'find', 4
   find.p:         
                     ; a
     db COUNT        ; a+1 n
     db LITW
     dw accept.p     ; a+1 n A
     db DECR, DUP    ; a+1 n A-1 A-1
     db CFETCH       ; .. A-1 N  / get the count
     db DUP          ; .. A-1 N N
     db RON, MINUS   ; .. adr-N       r: N
     db ROFF         ; .. adr-N N
                     ; a n A N
     db FCALL
     ; compare is returning count byte address
     dw compare.p    ; a n A flag
     db JUMPT, 17     ; a n A
     db LIT, '?', EMIT
     db DECR, DECR   ; a n A-2  / pointer to prev word
     db FETCH        ; a n [adr] 
     db DUP          ; a n [adr] [adr]
     db JUMPNZ, -19  ; a n [adr]
     db DROP, DROP, DROP   ; 
     db LIT, 0       ; return zero if not found
     db EXIT
     db DUP, FCALL
     dw udot.p
     db FCALL
     dw type.p
     db EXIT 

   compare.doc:
     ; db ' ( a A n -- flag) '
     ; db ' given 2 pointers to strings a and A '
     ; db ' compare the 2 strings for n bytes '
     ; db ' and put -1 on stack as flag if the strings are '
     ; db ' the same or flag=0 on stack if the strings '
     ; db ' are different. '
     ; dw $-compare.doc
   compare:
     dw find.p 
     db 'compare', 7
   compare.p:
                       ; a A n
     db RON            ; a A   r: n /n loop counter
     ; db DUP            ; a A A 
     db CFETCHPLUS     ; a A+1 [A]  
     db DUP, EMIT      ; debug
     db SWAP           ; a [A] A+1
     db RON, RON       ; a            r: n A+1 [A]
     db CFETCHPLUS     ; a+1 [a]      r: n A+1 [A] 
     db DUP, EMIT      ; debug
     db ROFF           ; a+1 [a] [A]  r: n A+1
     db EQUALS         ; a+1 flag     r: n A+1
     db JUMPT, 10      ; a+1          r: n A+1 
     db ROFF, ROFF     ; a+1 A+1 n
     db DROP, DROP, DROP    ; clear stacks
     db LIT, 0         ; flag=0 (false)
     db EXIT
     db ROFF           ; a+1 A+1      r: n 
     db LIT, '?', EMIT  ; debug
     db RLOOP, -25     ; a+1 A+1      r: n-1
     db ROFF           ; a+n A+n 0 
     db DROP, DROP, DROP    ; clear stacks
     db LIT, -1
     db EXIT

   test.compare.doc:
     ; db ' just tests the compare.doc proc '
   test.compare:
     dw compare.p 
     db 'test.compare', 12 
   test.compare.p:
     db LITW
     dw pad
     db FCALL
     dw accept.p
     db LITW
     dw pad
     db FCALL
     dw type.p
     db EXIT

   list.doc:
     ; db ' ( adr --  ) '
     ; db ' list all words by name given execution address '
     ; db ' of the last word. '
     ; dw $-list.doc
   list:
     dw compare.p 
     db 'list', 4
   list.p:         
                          ; adr
     ; db LIT, 25, RON   ; loop counter
     db DECR, DUP, CFETCH ; adr n  / get the name count
     db DUP          ; adr n n
     db RON, MINUS   ; adr-n
     db ROFF         ; adr-n n 
     db SWAP, DUP    ; n adr-n adr-n
     db RON, SWAP    ; adr-n n   / save adr-n to rstack
     db FCALL
     dw type.p
     db LIT, ' ', EMIT
     db ROFF        ; adr-n 
     db DECR, DECR  ; point to next pointer
     db FETCH       ; [adr-n-2]
     db DUP         ; *p *p
     ;db ISZERO      ; *p flag
     db JUMPNZ, -22    ; *p
     db LIT, '#', EMIT
     ;db ROFF, DROP  ; clear counter
     db EXIT 

   type.doc:
     ; db ' ( adr n -- ) '
     ; db ' Prints out n number of characters starting '
     ; db ' at address adr. '
     ; dw $-type.doc
   type:
     dw list.p 
     db 'type', 4
   type.p:         
                   ; adr n
     db SWAP, DUP  ; n adr adr
     db CFETCH     ; n adr a
     db EMIT       ; n adr
     db INCR, SWAP ; adr+1 n
     db DECR       ; adr+1 n-1
     db DUP        ; adr+1 n-1 n-1
     db JUMPNZ, -8 ; adr+1 n-1
     db EXIT

     ; using rloop
     db RON        ; adr       r: n
     db CFETCHPLUS ; adr+1 c   r: n
     db EMIT       ; adr+1     r: n
     db RLOOP, -2  ; adr+1     r: n-1
     db EXIT

   accept.doc:
     ; db ' ( buffer -- )
     ; db ' receive a line of input from the terminal '
     ; db ' and store it as a counted string in the buffer. '
     ; db ' This should be rewritten to discard excess chars.'
     ; dw $-accept.doc
     ; also need to handle backspaces to backtrack over
     ; buffer
   accept:
     dw type.p 
     db 'accept', 6
   accept.p:      
                       ; ( adr -- )
     db DUP, DUP       ; a a a
     db INCR, DUP      ; a a a+1 a+1 
     db KEY            ; a a a+1 a+1 'x' 
     db DUP            ; a a a+1 a+1 'x' 'x'
     db EMIT, DUP      ; a a a+1 a+1 'x' 'x'
     db LIT, 13, MINUS, JUMPZ, 6  ;  a a a+1 a+1 'x'
     ; detect backspace 
     ; db DUP, LIT, ??, MINUS, JUMPNZ, +??
     ; handle backspace
     ; LIT, space, EMIT, LIT, backspace, EMIT 
     ; DROP, DROP, DECR   
     ; JUMP, -??   ; get next key
     db SWAP           ; a a a+1 'x' a+1
     db CSTORE         ; a a a+1  /put char in buffer
     db JUMP, -13      ; not newline so get another char
     db LIT, 10, EMIT     ; print newline if enter pressed
                          ; a a a+n a+n 'x'
     db DROP, DROP, DECR  ; a a a+n-1 
     db SWAP, MINUS       ; a n-1
     db SWAP, CSTORE      ; [a] := n-1
     db EXIT    ; all virtual procedures end with 'exit'

   nextw.doc:
     ; db 'a buffer to hold a parsed word '
   nextw:
   nextw.p:
   nextw.n db 4, 'type'

   pad: db 15, 'one two three 4'
   test: dw 123, 56123, 12345
   
   w1: db 5, 'treeZ'
   w2: db 5, 'treez'

   wow db KEY, EMIT, EXIT
   ; testing multisector stack machine byte code
   code:
     
     db LITW
     dw w1
     db INCR
     db LITW
     dw w2
     db INCR
     db LIT, 5 

     db FCALL
     dw compare.p
     db FCALL
     dw udot.p
     db LIT, ' ', EMIT
     db 0

     db LITW
     dw accept.p 
     db FCALL
     dw udot.p
     db LIT, ' ', EMIT
     db LITW
     dw nextw.n 
     db FCALL
     dw find.p
     db 0

     db LITW
     dw accept.p
     db FCALL
     dw name.p
     db LITW
     dw accept.p
     db FCALL
     dw list.p
     db 0
 
     db LITW
     dw nextw.n 
     db FCALL
     dw find.p
     db 0

     ; parse word starts here
     db LITW
     dw pad
     db COUNT
     db FCALL
     dw pword.p
     db LITW
     dw nextw.n
     db COUNT, FCALL
     dw type.p 
     db 0

     db LITW
     dw ron.x 
     db DUP
     db FCALL 
     dw udot.p 
     db FCALL
     dw opcode.p
     db 0

     db LITW,
     dw test
     db FETCHPLUS
     db FCALL
     dw udot.p
     db FETCHPLUS
     db FCALL
     dw udot.p
     db FETCHPLUS
     db FCALL
     dw udot.p
     db 0

     db LIT, 5, RON
     db LITW
     dw 65333 
     db ROFF, DUP, RON    ; n 5
     db PLUS              ; n+5
     db FCALL
     dw udot.p
     db RLOOP, -10 
     db 0

   start:
      mov ax, cs      ; cs is already correct (?!) 
      mov ds, ax       ; data segment  
      ; point es:di directly after the code and data segment
      ; i.e. after the 4 sectors (4 * 512 bytes) which contain code and
      ; data. We will use es:di as the return stack pointer. When 
      ; a value is pushed on the return stack, value is written to
      ; [es:di] and di is incremented by 2
      add ax, 128      ; 128 * 16 = 2048
      mov es, ax       ; using es:di as return stack pointer 
      mov di, 0

      ; the calculations is as follows
      ; we have loaded 4 sectors = 4 * 512 bytes = 2048 bytes
      ; we want a data stack of size 4K (which is big) = 4094 bytes
      ; also we want a return stack of size 4K for hefty recursive
      ; functions, although these huge sizes are not necessary.
      ; x86 hardware stack grows up or down? ...
      ; divide by 16 because that is how segment addressing works
      ; That is: if we multiply the number in ss or es or ds by 16
      ; we get a absolute memory address

      add ax, 640      ; (4096 + 4096 + 2048) / 16 bytes per paragraph
      mov ss, ax       ; a 4K stack here
      mov sp, 4096     ; set up the stack pointer

      push code 
      call exec.x

   here:  jmp here 

    ; actually we are loading 4 sectors (4 * 512 bytes == 2048 bytes)
    ; so it should be 2048 not 1024 but I want to keep track of 
    ; how big the machine is getting
    ; times 1024-($-$$) db 0   ; Pad remainder of sectors with 0s
    times 2048-($-$$) db 0   ; Pad remainder of sectors with 0s

    ; dont need boot signature, because this is not the boot sector
    ; dw 0xAA55               ; The standard PC boot signature
  ,,,

BYTE CODE FORTH STYLE SYSTEM

  A byte code system using opcodes and table offsets as 
  shown above. Looping might actually be easier to implement
  with byte code, but a return stack is needed for >r and r>

COMPILING FORTH STYLE SYSTEM

  This is the same as the core system below, but instead
  of the interp: word there is compile: where the entered 
  text is compiled to a temporary buffer and then executed
  with exec.

CORE FORTH STYLE SYSTEM READ ONLY

  This system does not compile new words, it just executes
  words in an interpreter. Without any compiling system or 
  the use of byte code, it seems difficult to use looping 
  or conditionals, since there are no instruction numbers to
  jump to...

  This section is designed to contain a core bootloading system
  with reliable debugged word-functions. The core words will be
 
    inbuffer - push address of input buffer on the stack
    wbuffer - push the word buffer on the stack
    base - push pointer to current base on stack
    hex - make the base variable 16
    bin - make the base 2 
    decimal - make the base 10
    atparse - push onto stack pointer to position in inbuffer and char count
    accept - receive typed input
    nword - get next space delimited word from input buffer
    dup - duplicate top item on stack
    drop - drop top item on stack
    store - ! stores char at pointer
    fetch - @ fetches char from pointer 
    fetchplus - @+ fetches 1 char from data memory, advances pointer
    plus - + add top 2 items on stack
    key - get one char from key
    emit - pop and display top stack item as ascii
    dothex - pop and display top stack item as hex
    count - forth count word
    type - print counted string 
    last - gives pointer to last word in dictionary.
    find - search through the dictionary for word matching wbuffer 
    exec - execute the word-function token found by 'find'
    num - try to convert wbuffer text to a signed number -32K < n < +32K
    dump - print in hex and asci n bytes starting at p*
    list - show what words are available
    .sx - show the stack in hex and asci
    interp - provide a read/parse/execute/ loop (repl)
    flags - pointer to flags such as "not a number" etc, not used
    cursor - keep track of next print position (not implemented)
    cursorb - an alternative cursor  (not implemented)

  Flags is not at all a tradition forth idea, but all microcontrollers
  have a flags register, so why shouldn't forthish, which is a type
  of virtual machine.

  The emphasis will be writing the minimal amount of code in the 
  simplest way possible. Idea: if nword just return pointer and char
  count, then wbuffer is not necesssary. Find then can operate
  with pointer and count, but nword still has to update atparse
  structure.

  This core system can be used by build.pl and other words as a framework for
  interactively testing other functions.  A script can replace all .doc field
  with "dw 0" for minimal code size

  * a bootloading forth-like core system 
  ----------

    ; replace this with bootcode
    ; eg: sed '/bootload/r bootload.asm' 
    ; [bootload]

    ; first deal with buffers and variables needed

    
    ; some colours (in BL reg) for int 10h, ah=0x0E 
    ; these colours only work when video mode is graphical
    ; such as 0x12 or 0x13

    BLUE equ 1 
    GREEN equ 2
    AQUA equ 3
    RED equ 4
    PURPLE equ 5
    BROWN equ 6
    WHITE equ 7
    DGREY equ 8
    LBLUE equ 9
    ; ... colours up to 0xF (foreground and background) 

    inbuffer.doc:
      db 'Push pointer to input buffer (a counted string)'
      db 'The "inbuffer" is the user input buffer filled with the '
      db '"accept" function. The 1st byte of the buffer is the '
      db 'character count. The buffer is not zero terminated'
      dw $-inbuffer.doc
    inbuffer:
      dw 0           
      db 8, 'inbuffer'
    inbuffer.x:
      pop ax           ; preserve fn ip return
      push inbuffer.d
      push ax
      ret
    inbuffer.d times 65 db 0      ; buffer where user input goes

    wbuffer.doc:
      db 'Pointer to parsed word buffer.'
      db 'The wbuffer contains one word or number (+/-nnnn) with no'
      db 'leading or trailing spaces that has been parsed from the'
      db 'inbuffer with the "word" function. '
      dw $-wbuffer.doc
    wbuffer:
      dw inbuffer
      db 7, 'wbuffer'
    wbuffer.x:
      pop ax           ; preserve fn ip return
      push wbuffer.d
      push ax
      ret
    wbuffer.d times 64 db 0    ; counted string word buffer

    ; just print a hash for testing
    hash:
      dw wbuffer
      db 4, 'hash'
    hash.x:
      mov ah, 0Eh     ; just print a hash with bios
      mov al, '#'
      int 10h         ; x86 bios interrupt
      ret

    base.doc:
      db 'Pointer to base variable.'
      db 'the base variable may influence how num: words and dot'
      dw $-base.doc
    base:
      dw hash 
      db 4, 'base'
    base.x:
      pop ax           ; preserve fn ip return
      push base.d
      push ax
      ret
    base.d db 10       ; make base initial decimal 

    dw 0
    hex:
      dw base
      db 3, 'hex'
    hex.x:
      mov [base.d], byte 16
      ret

    decimal.doc 
      db 'make "base" variable decimal.'
      db 'The base variable influences how numbers are parsed and '
      db 'displayed.'
      dw $-decimal.doc
    decimal:
      dw hex
      db 7, 'decimal'
    decimal.x:
      mov [base.d], byte 10
      ret

    bin.doc 
      db 'make base binary'
      dw $-bin.doc
    bin:
      dw decimal
      db 3, 'bin'
    bin.x:
      mov [base.d], byte 2
      ret

    atparse.doc:
      db 'Push onto stack parse position in "inbuffer" and remaining count.'
      db 'atparse helps nword to keep track of where in the input buffer'
      db 'it is currently parsing.'
      dw $-atparse.doc
    atparse:
      dw bin 
      db 7, 'atparse'
    atparse.x:
      pop ax           ; preserve fn ip return
      push word [atparse.d]
      xor bh, bh
      mov bl, byte [atparse.count]
      push bx          ; push byte count onto stack
      push ax
      ret
    atparse.d dw inbuffer.d+1   ; pointer to inbuffer, skip count 
    atparse.count db 0          ; count zero

    flags.doc:
      db 'Not used currently. push flag register (just nan)'
      dw $-flags.doc
    flags:
      dw atparse 
      db 5, 'flags'
    flags.x:
      pop ax           ; preserve fn ip return
      push word [flags.d]
      push ax
      ret
    flags.d dw 0
    
    ; some place holders for compilation

    ; ---------------
    ; basic forth words

    dup.doc:
      db 'Duplicates the top stack item'
      db 'eg: 12 dup   stack now has 12 12'
      dw $-dup.doc
    dup: 
      dw flags       ; link to previous word 
      db 3, 'dup'    ; strings are 'counted' 
    dup.x:
      pop bx      ; juggle fn return address
      pop ax      ; get param to duplicate
      push ax
      push ax
      push bx     ; restore fn return address
      ret

    drop.doc:
      db 'removes top stack item'
      dw $-drop.doc
    drop: 
      dw dup       ; link to previous word 
      db 4, 'drop'    ; strings are 'counted' 
    drop.x:
      pop bx      ; juggle fn return address
      pop ax      ; discard top stack item 
      push bx     ; restore fn return address
      ret

    store.doc:
      db 'stores a byte at a memory address'
      db 'The low byte of the stack item is stored at given address'
      db 'eg: 12 date !   puts 12 in the variable date'
      dw $-store.doc
    store:
      dw drop
      db 1, '!'
    store.x:
      pop dx
      pop di    ; where to store
      pop ax    ; what to store
      stosb     ; [di] := al
      push dx
      ret

    fetch.doc:
      db 'fetches a byte (char) at given memory address.'
      db 'The top stack item is replaced with the value stored '
      db 'at that memory address. This is once of the most fundamental'
      db 'forth words, and can be used to implement lots of others.'
      db 'it is the equivalent of peek in basic, or pointer dereferencing'
      db 'in c.'
      db 'eg: wbuffer @ .  displays count of wbuffer'
      dw $-fetch.doc
    fetch:
      dw store
      db 1, '@'
    fetch.x:
      pop dx         ; juggle
      pop si         ; address from which to fetch
      xor ax, ax     ; set ax = 0
      mov al, [si]   ; 
      push ax        ; leave char
      push dx
      ret

    fetchplus.doc:
      db 'fetches a byte and advances memory address.'
      db 'leaves addr, char on stack. char is top item'
      db 'this is like lodsb in x86 or ld X+ in avr assembler'
      db 'eg: wbuffer @+'
      dw $-fetchplus.doc
    fetchplus:
      dw fetch 
      db 2, '@+'
    fetchplus.x:
      pop dx         ; juggle
      pop si         ; address from which to fetch
      xor ax, ax     ; set ax = 0
      lodsb          ; get value into al
      push si        ; save incremented address on stack
      push ax        ; leave char
      push dx
      ret

    plus.doc:
      db 'add top 2 stack items'
      dw $-plus.doc
    plus:
      dw fetchplus
      db 1, '+'
    plus.x:
      pop dx         ; juggle
      pop ax
      pop bx
      add ax, bx
      push ax
      push dx
      ret

    key.doc:
      db 'get one keystroke from user and place on stack'
      dw $-key.doc
    key:  
      dw plus
      db 3, 'key'  ; forth-style function header 
    key.x:
      mov ah, 0    ; wait for keypress bios function
      int 16h
      pop bx       ; juggle function return pointer
      push ax      ; save keypress value on stack
      push bx      ; restore return pointer to stack
      ret

    emit.doc:
      db 'removes and displays top item on stack as an ascii character.'
      db 'I suppose the character is in the low byte of the stack item...'
      dw $-emit.doc
    emit:
      dw key
      db 4, 'emit'
    emit.x:
      pop bx         ; juggle return pointer
      pop ax         ; char in al
      push bx
      mov ah, 0x0E   ; bios teletype function 
      int 10h        ; x86 bios 
      ret

    dothex.doc:
      db 'displays the top item on the stack in 4 digit hex format.'
      db 'This function does not take the item off the stack.'
      dw $-dothex.doc
    dothex:
      dw emit  
      db 4, '.hex'
    dothex.x:
      pop bx     ; return address
      pop dx     ; the number to print (top item on stack)
      push dx    ; restore item to stack
      push bx    ; restore return address
      mov ah, 0x0E ; bios teletype function 
      mov bx, hextable   ; translation table
      mov cx, 4          ; number of digits to print
      .again:
        rol dx, 4      ; rotate left 4 bits (print highest first)
        mov al, dl     ; bits to convert to hex digit
        and al, 0x0F   ; only lower 4 bits relevant
        xlatb          ; replace al with hex digit in translation table
        int 10H        ; invoke bios print function
        loop .again
      mov al, 'H'      ; print an H to indicate hex number
      mov ah, 0eH      ; echo the char (just for debugging)
      int 10H
      ret

   hextable db "0123456789ABCDEF"    ; translation table

    dothexbyte.doc:
      db 'prints the low byte of top stack item in 2 digit hex'
      db 'the stack item is not removed'
      dw $-dothexbyte.doc
    dothexbyte:
      dw dothex       ; link
      db 6, '.xbyte'
    dothexbyte.x:
      pop bx     ; fn return address
      pop dx     ; the number to print (parameter on stack)
      push dx    ; restore top stack item
      push bx    ; restore return address

      mov ah, 0x0E ; bios teletype function 
      mov bx, hextable   ; translation table
      mov cx, 2          ; number of digits to print
      .again:
        rol dl, 4      ; rotate left 4 bits (print highest first)
        mov al, dl     ; bits to convert to hex digit
        and al, 0x0F   ; only lower 4 bits relevant
        xlatb          ; replace al with hex digit in translation table
        int 10H        ; invoke bios print function
        loop .again

      ret

    ; ------------
    ; help words

    dump.doc:
      db 'prints contents of memory in hex and ascii'
      db 'eg: inbuffer 20 dump  shows 20 bytes of the input buffer'
      dw $-dump.doc
    dump:
      dw dothexbyte
      db 4, 'dump' 
    dump.x:
      pop dx      ; juggle return fn
      pop cx      ; how many chars to print 
      pop si      ; where to start printing
      push dx     ; restore
      push si     ; save si and cx
      push cx
    .nextbyte:
      xor ax, ax
      lodsb
      mov dl, al
      mov ah, 0x0E ; bios teletype function 
      mov bx, hextable   ; translation table
      push cx            ; save cx again
      mov cx, 2          ; number of digits to print
      .again:
        rol dl, 4      ; rotate left 4 bits (print highest first)
        mov al, dl     ; bits to convert to hex digit
        and al, 0x0F   ; only lower 4 bits relevant
        xlatb          ; replace al with hex digit in bx translation table
        int 10H        ; invoke bios print function
        loop .again
      pop cx           ; restore counter
      mov ah, 0eh  ; print char func
      mov al, ' '  ; space
      int 10h
      loop .nextbyte 

      pop cx   ; restore counter, how many chars
      pop si   ; restore pointer to start of memory 
      mov ah, 0eh  ; print char func
      mov al, 13   ; 
      int 10h
      mov al, 10   ; new line
      int 10h
    .nextchar:
      lodsb        ; get [si] into al
      int 10h      ; print char in al
      mov al, ' '  ; space
      int 10h
      mov al, ' '  ; space
      int 10h
      loop .nextchar
      ret

    bstack.d dw 0      ; pointer to bottom of stack, init at start

    ; print the memory address: then print each item of stack
    ; starting at bottom of stack, 2bytes then space. Underneath
    ; print ascii of hex values. Print top of stack indicator
    ; such as <<
    ; also print return address in brackets
    ; eg
    ;    0F45: A454 3333 2222 1111 <<
    ;        : a 6  e r  ...

    dotsx.doc:
      db 'Prints out stack in hex/asci with memory addresses'
      dw $-dotsx.doc
    dotsx:
      dw dump
      db 3, '.sx'
    dotsx.x:
      mov dx, sp       ; top of stack address
      mov ah, 0x0E     ; bios teletype function 

      mov cx, 4          ; number of digits to print
      .again:
        rol dx, 4      ; rotate left 4 bits (print highest first)
        mov al, dl     ; bits to convert to hex digit
        and al, 0x0F   ; only lower 4 bits relevant
        mov bx, hextable ; translation table
        xlatb          ; replace al with hex char in bx translation table
        mov bl, BLUE   ; print in blue
        int 10H        ; invoke bios print function
        loop .again

      mov bl, GREEN    ; print in blue
      mov ah, 0x0E     ; type char function 
      mov al, ':'      ; greater than indicates top of stack 
      int 10H
      mov al, ' '      ; greater than indicates top of stack 
      int 10H
      mov al, '>'      ; greater than indicates top of stack 
      int 10H
      mov al, '>'      ; 
      int 10H
      mov al, ' '      ; separate 
      int 10H
      
      ; stack grows down, not up
      cld               ; lodsw forwards
      mov si, sp        ;

    .nextitem:
      mov dx, [ss:si]   ; get next stack item 
      mov ah, 0x0E     ; type char function
      mov cx, 4        ; number of digits to print
      .nextnibble:
        rol dx, 4      ; rotate left 4 bits (print highest first)
        mov al, dl     ; bits to convert to hex digit
        and al, 0x0F   ; only lower 4 bits relevant
        mov bx, hextable ; translation table
        xlatb          ; replace al with hex digit in translation table
        int 10H        ; invoke bios print function
        loop .nextnibble
      mov al, ' '      ; separate 
      int 10H
      add si, 2        ; 
      cmp si, [bstack.d]  ; check if last item 
      jne .nextitem

    .exit:
      ret

    ; This assumes the dict has at least one word
    list.doc:
      db 'List all function words in the dictionary. List traverses the '
      db 'linked list dictionary and prints the name of each function '
      db 'word found in the function header. This '
      db 'leaves nothing on the stack. It relies on a lastword data '
      db 'item that contains a pointer to the last word in the dictionary '
      db '... needs paging etc '
      db ' eg: list '
      dw $-list.doc
    list:
      dw dotsx       ; link 
      db 4, 'list'
    list.x:
      mov bx, last
    .nextword:
      mov si, bx      ; pointer to current function header
      add si, 2       ; the counted string is 2 bytes after header
      xor ax, ax      ; ax := 0 
      lodsb           ; al := [si]++
      mov cx, ax      ; load the string count into cx for looping
      cmp cx, 0       ; if nothing to print exit
      je .exit
      mov ah, 0eh     ; bios print character function
    .nextchar:
      lodsb        ; get next char from message into al
      int 10h         ; x86 bios interrupt
      loop .nextchar  ; decr cx loop counter 
      mov al, 32      ; space char
      int 10h
      mov bx, [bx]    ; get the pointer to the next function (or 0)
      cmp bx, 0       ; if start of dict, then link is 0
      jne .nextword 
    .exit:
      ret

    ; --------- 
    ; interp words

    ; cant call this 'word' because of nasm syntax
    ; there is still a bug here when the last word in
    ; inbuffer is a single character...
    nword.doc:
      db 'Get next word from inbuffer using "atparse" position'
      db 'The word is copied to the wbuffer with no leading or '
      db 'trailing spaces.'
      dw $-nword.doc
    nword: 
      dw list
      db 5, 'nword'
    nword.x:
      xor cx, cx              ; set counter = 0
      mov cl, [atparse.count] ; remaining chars in inbuffer
      mov si, [atparse.d]     ; current pos in inbuffer
      lea di, [wbuffer.d+1]   ; copy chars into word buffer, skip count
      xor dx, dx              ; use dl as char counter for wbuffer, no dont
      cld
      cmp cl, 0         ; no more chars in inbuffer so exit 
      je .exit          ; 
    .spaces:
      lodsb             ; get char into al (si++)
      cmp al, ' '       ; skip leading spaces   
      loope .spaces     ; loop while cx>0 and char is space

      cmp cl, 0         ; no more chars in inbuffer so exit
      ja .nextchar      ; if final word in inbuffer is single char
      cmp al, ' '       ; make sure we store it
      je .exit          ; last char is space, so dont store
      stosb             ; store non-space char 
      jmp .exit         ; no more chars so exit

    .nextchar:
      stosb
      lodsb         ; get next char into al (si++) 
      cmp al, ' '   ; if space then word is finished
      loopne .nextchar
      
      cmp cl, 0     ; at end of inbuffer write last char
      jne .exit
      stosb
    .exit:
      xor ax, ax
      mov ax, di              ; how many chars in wbuffer
      sub ax, wbuffer.d+1
      mov [wbuffer.d], al     ; write count to 1st byte of wbuffer
      mov [atparse.count], cl ; update remaining chars
      mov [atparse.d], si     ; update parse pointer
      ret
   
    ; the forth count word
    ; stack: addr -- addr+1, char count
    dw 0
    count:
      dw nword
      db 5, 'count'
    count.x:
      pop dx         ; preserve return fn pointer
      pop si         ; buffer address
      xor ax, ax     ; ax := 0
      lodsb          ; get count into al, increment si
      push si        ; new buffer address
      push ax        ; char count
      push dx        ; restor fn return ip
      ret

    ; stack: buffer address, char count <<  
    dw 0           ; no doc
    type:
       dw count        ; link to previous dictionary entry 
       db 4, 'type'  
    type.x:
       cld             ; make lodsb step forwards
       pop bx          ; juggle return address for call
       pop cx          ; how many chars to print
       pop si          ; address of buffer to print
       push bx         ; restore return function call
       cmp cx, 0       ; if nothing to print exit
       je .exit
       mov ah, 0eh     ; bios print character function
     .again:
       lodsb        ; get next char from message into al
       int 10h      ; x86 bios interrupt
       loop .again  ; decr cx loop counter 
     .exit:
       ret

    num.doc:
      db 'Tries to convert a counted string to a (signed) integer.'
      db 'If successful, put the number on the stack. If not successful '
      db 'set the the not-a-number flag 1 in num.error register'
      dw $-num.doc
    num:
      dw type
      db 3, 'num'
    num.x:
      ; check for valid first char +/-[0-9] 
      ; if '-' set negative flag (register?)
      ; check for valid digits

      pop bx    ; juggle fn return pointer
      pop si    ; get pointer to counted buffer
      push bx   ; restore fn pointer
     
      mov di, si    ; save counted string pointer to get sign later
      xor dx, dx    ; accumulator
      xor bx, bx    ; multiplier 
      xor cx, cx    ; char counter for wbuffer

      cld         ; make lodsb step forwards
      lodsb       ; get count into al,  al <- [si], si++
      mov cl, al  ;
      cmp cx, 0   ; no chars in buffer so its an error 
      je .error
      lodsb         ; 1st char into al, al <- [si], si++
      ; dec cx      ; no, shouldnt, decrement char counter
      cmp al, '-'   ; is 1st char negative sign?
      jne .notnegative
      cmp cx, 0     ; no chars left, just - sign, error
      je .error
      lodsb          ; get next char (digit) from wbuffer
      dec cx         ; decrement char counter
      jmp .notpositive 
    .notnegative:
      cmp al, '+'
      jne .notpositive
      cmp cx, 0      ; wbuffer only has '+' in it, error
      je .error
      lodsb          ; get next char (digit) from wbuffer
      dec cx         ; decrement char counter
    .notpositive:
      ; actually a valid digit is dependant on the base!!!!
      ; we should use base 1 < n < 17
    .nextdigit:
      cmp al, '0'     ; if char is less than '0' then not digit
      jb .notdigit    ; unsigned jump to error 
      cmp al, '9'     ; if char is less than '0' then not digit
      ja .notdigit    ; unsigned jump to error 
      sub al, '0'     ; convert to digit
      xor ah, ah      ; set ah = 0, so ax := al

      push ax          ; save digit 0-9 on stack
      mov ax, dx       ; get intermediate result into ax
      xor bh, bh       ; set bh := 0
      mov bl, [base.d] ; multiply by base (eg 2, 10, 16 - 1 < n < 256) 
      mul bx           ; do dx:ax := ax*bx 
     
      pop bx         ; get last digit 0-9 from stack
      jo .toobig     ; overflow... result too big to store in AX 
      add ax, bx     ; add digit to result
      mov dx, ax     ; store intermediate result in dx

      lodsb          ; next char into al, al <- [si]
      loop .nextdigit        ; keep going while more digits/chars (cx > 0)

    .exit:
      mov [num.error], word 0  ; set is-a-number flag 
      mov ax, dx 
      mov bl, [di+1]    ; is first char - ?
      cmp bl, '-'       ; if so, negate the result
      jne .continue
      neg ax
    .continue:
      pop bx     ; juggle fn return
      push ax    ; leave result on stack
      push bx    ; restore fn return
      ret

    .toobig:  
      mov [num.error], word 1   ; set number-too-big flag 
      ret

    .notdigit:  
      mov [num.error], word 2   ; set not-a-number flag 
      ret

    .error:
      mov [num.error], word 3   ; set other not-a-number flag 
      ret

    num.error dw 0

    ; !! allow arrow keys to move cursor and insert ??
    accept.doc:
      db 'get max 64 chars from keyboard and put in inbuffer. '
      db 'Enter terminates input and stores count in inbuffer'
      db 'This version allows user to edit with the backspace key'
      db 'It achieves this by echoing backspace, space, backspace and'
      db 'updating the inbuffer'
      dw $-accept.doc
    accept:
      dw num          ; link 1st word has a zero link 
      db 6, 'accept'  ; forth-style function header 
    accept.x:
      mov di, inbuffer.d    ; where to store line
      inc di       ; skip count byte to store 1st char
      xor dl, dl   ; char counter := zero
      cld          ; make stosb go forwards
    .nextkey:
      cmp dl, 64   ; only accept max 64 chars
      je .exit     ; 
      mov ah, 0    ; wait for keypress bios function
      int 16h
      cmp al, 13   ; was the key press an 'enter'?
      je .exit     ; exit if enter pressed
      cmp al, 8    ; was the key press a backspace
      je .backspace  ; do something sensible
      mov ah, 0eh    ; echo the character
      int 10h
      stosb        ; put the char into the buffer
      inc dl       ; increment char counter
      jmp .nextkey
    .backspace:    ; allow user to edit input with backspace
      cmp dl, 0    ; if at start of buffer do nothing
      je .nextkey  ;
      dec dl       ; decrement char count
      dec di       ; one char back in buffer
      mov [di], byte 0  ; erase char 
      mov ah, 0eh  ; print char func
      mov al, 0x08 ; ASCII for Backspace
      int 10h
      mov al, 0x20 ; ASCII for Space
      int 10h
      mov al, 0x08 ; ASCII for Backspace
      int 10h
      jmp .nextkey
    .exit:
      mov [inbuffer.d], dl     ; store char count in inbuffer
      mov [atparse.count], dl  ; reset parse position to beginning
      mov [atparse.d], word inbuffer.d+1
      mov ah, 0eh  ; echo the character
      mov al, 13   ; a new line
      int 10h
      mov al, 10
      int 10h

      ret

    ; execute a function given a pointer to its header on the stack
    ; if pointer is zero, then this should pop the 0 and exit, no??
    exec.doc:
      db 'execute a word given a pointer on the stack'
      db ' eg: lastword exec  '
      dw $-exec.doc
    exec:
      dw accept           ; link to prev
      db 4, 'exec'
    exec.x:
      pop ax
      pop bx     ; get pointer to function
      push ax    ; preserve fn return pointer
      cmp bx, 0  ; a zero pointer should not be executed
      je .exit
      add bx, 2  ; point to name count
      mov cl, [bx]  ; get the count
      inc bx        ; skip over count
      add bl, cl    ; advance the pointer to the function

      ; !! not call [bx] thats a pointer to jumptable
      ; !!! call bx may change the stack (probably will) so we need 
      ; !!! to preserve the call return ip 
      ; instead of this below, we should have a return stack
      ; so that function words can be nested
      pop word [execreturn]      ; save return ip
      call bx       ; call the fn pointed to by bx
      push word [execreturn]     ; restore fn return ip
    .exit:
      ret
    ; a dodgy solution, but any register might get overwritten
    execreturn dw 0
    
    ; point and return a pointer to the found word or else 0 on the 
    ; stack
    ; stack: search term, start pointer -- function header pointer

    find.doc:
      db 'Search dictionary for word in wbuffer and return pointer.'
      db ' eg: in lastword find '
      dw $-find.doc
    find:
      dw exec         ; link to prev
      db 4, 'find'
    find.x:
      pop dx     ; juggle fn return ip
      pop bx     ; where to start searching (eg last entry in dict)
      pop ax     ; counted string buffer to search for 
      push dx    ; restore fn ip
    .again:
      xor cx, cx      ; set cx:=0
      mov si, bx      ; pointer to current function header
      add si, 2       ; the counted string is 2 bytes after header
      mov cl, [si]    ; the count of the search term
      inc cl          ; we also have to compare the count bytes
      mov di, ax      ; the search term pointer
      cld            ; search forwards (clear direction flag)
      repe cmpsb     ; compare all characters for equality
      je .found
      mov bx, [bx]    ; get the pointer to the next function (or 0)
      cmp bx, 0       ; if start of dict, then link is 0
      je .notfound    ; no more to words search, so exit
      jmp .again 
    .notfound:
      pop dx
      push 0         ; not found so return 0
      push dx
      ret
    .found: 
      pop dx         ; juggle return ip
      push bx        ; return pointer to found word on stack 
      push dx
      ret

    ; -----------------------------

    interp.doc:
      db 'The main interpreting loop for a forthish system.'
      dw $-interp.doc
    interp:
      dw find            ; link to prev
      db 6, 'interp'
    interp.x:

    .nextline: 
      mov bl, 9    ; colour pale blue
      mov ah, 0eh  ; print char func
      mov al, 13 
      int 10h
      mov al, 10  
      int 10h
      mov al, '>'  ; print a prompt 
      int 10h
      mov bl, 2    ; colour green
      call accept.x      ; put max 64 chars in input buffer (inbuffer.d)

      ;call atparse.x     ; should push pointer & count on stack
      ;call type.x        ; just to debug

    .nextword:
      mov cl, [atparse.count]  
      cmp cl, 0          ; if no more chars in inbuffer, get new line
      je .nextline 

      call nword.x       ; parse next word into wbuffer
      mov al, [wbuffer.d]; if wbuffer count is zero, no word parsed 
      cmp al, 0          ; wbuffer count zero 
      je .nextline       ; no more words, so get next line of input 

      push wbuffer.d     ; what buffer to look in 
      push last          ; where to start searching
      call find.x        ; try to find word in dict 
      pop ax             ; ax is pointer to executable for [word] 
      push ax            ; restore pointer to fn for exec
      cmp ax, 0          ; 0 means word was not found
      jne .found

      pop ax             ; get rid of zero pointer (word not found)
      push wbuffer.d     ; word buffer to convert to number
      call num.x         ; try to convert to a number and push on stack
      mov ax, [num.error] ; check the "not-a-number" flag
      cmp ax, 0          ; if num.error==0 then number parsed ok
      je .nextword       ; its a number, already on stack
      
      call werror.x      ; print error message if word not found
      jmp .nextword      ; for debugging loop through all words 
      ;jmp .nextline     ; unknown word so just get a new line

    .found:
      call exec.x        ; watch out for stack mangling with exec 
      ; print some good message in green like
      ; [ ok ]      
      jmp .nextword

    .exit:            ; never get here because interp goes forever !
      ret

    werror.doc:
      db 'prints an error message when word not found/not number.'
      dw $-werror.doc
    werror:
      dw interp           
      db 6, 'werror'
    werror.x:
      ; some error indicator
      mov bl, 7    ; colour white
      mov ah, 0eh  ; x86 bios echo char fn
      mov al, '['  ; a delimiter char for debug
      int 10h
      push wbuffer.d   ; when word is neither function nor number 
      call count.x     ; print it out with ? and stop parsing line 
      mov bl, 4        ; colour
      call type.x
      mov ah, 0eh  ; x86 bios echo char fn
      mov al, ']'  ; a delimiter char for debug
      int 10h
      mov al, ' '  ; a delimiter char for debug
      int 10h
      mov al, '?'  ; a delimiter char for debug
      int 10h
      mov al, '?'  ; a delimiter char for debug
      int 10h
      mov al, ' '  ; a delimiter char for debug
      int 10h
      ret

    ; 
    colour.doc:
      db 'show colours'
      dw $-colour.doc
    colour:
      dw werror
      db 6, 'colour'
    colour.x:
      mov cx, 9
    .next:
      mov ah, 0x0E
      mov bl, cl
      mov al, cl
      add al, '0'
      int 10h
      loop .next
      ret

    ; make last the last word for convenience
    last.doc:
      db 'Push pointer to header of last dictionary word '
      dw $-last.doc
    last:
      dw colour 
      db 4, 'last'
    last.x:
      pop ax           ; preserve fn ip return
      push word [last.d]
      push ax
      ret
    last.d dw last

   ; ---------------
   ; start of main program

    start:

     mov ax, cs    ; make data segment and es same as code segment
     mov ds, ax
     mov es, ax
     mov [bstack.d], sp ; 

     mov ah, 0
     mov al, 12h
     int 10H

     ; try to make cursor visible
     mov ah, 1
     mov ch, 1
     mov cl, 2
     int 10H

     call interp.x
     here: jmp here          ; loop forever 

  ,,,


IDEAS ABOUT A FORTHISH SYSTEM

  The forth language introduced some revolutionary ideas that
  never led to any kind of revolution. Namely: place code units
  (functions/ words/ objects) within a datastructure which includes
  the words name. This provides what is called today 'reflexivity'-
  the ability of code to 'know' something about itself. Since code
  is within a data structure it provides the ability to analyse that
  code. Code speaking about itself is the realm of AI, even if
  those ambitions are not helpful.

  Basic forth functions: receive a word from the keyboard and look
  up the word in a dictionary (find). If word found, execute the code
  associated with the word (exec). If word not found, try to convert
  input to a number (>number) and push it on the stack. All functions
  receive their parameters on the 'stack' (which is either the
  system stack, or else a software stack)

  * an example of a forth word data structure (from 'itsy-forth')
  -----------------
        ; header
        dw link_to_previous_word
        db 3, 'nip'  ; strings are 'counted' in forth (3 chars in nip)
 xt_nip dw docolon   ; xt= execution token, forth jargon
        ; body
        dw xt_swap   ; pointers to other forth 'words'
        dw xt_drop   ; remove last item on stack
        dw xt_exit   ; pop stack etc 
  ,,,


 * example of forth dictionary entry with assembly code
 -----------
        dw link_to_previous_word
        db 1, '+'
xt_plus dw mc_plus
mc_plus pop ax
        add bx,ax
        jmp next
 ,,,

TOKEN THREADED FORTH ....

  In this style of forth each primitive is a number which is
  a virtual opcode. This type of forth creates a proper virtual
  machine but is reputed to be the slowest. There is 
  probably a table containing opcodes and function pointers eg:
     1  AAF1
     2  AAFF
     3  AB1C    etc

  We could think about universal naming of forth words. That
  is a prefix to each word stored in a table. 

SUBROUTINE THREADED FORTHS ....

  each function call is just a native 'call fn'

INDIRECT THREADED ....

  An inner interpreter is used to call a series of function 
  pointers within a forth word. This is considered slower than
  subroutine threaded on modern machines.

EXERCISES TOWARD A FORTH LIKE SYSTEM ....

  We can write small programs which perform forth-like 
  functions to demonstrate different techniques for creating
  forth-ish systems.

  The following code is similar to the forth 'accept' word. 
  It gets a certain number of characters and copies them to 
  a buffer. In forth, the line is then parsed into counted
  words and executed, one word at a time.

  The entry code below should handle 'backspaces' to allow
  the user to edit the text entered.

  * get some text from the keyboard and copy to a counted buffer
  ------------------
   org 7c00h
     jmp start
     SIZE equ 9 
     buffer resb SIZE+1     ; 1 byte for the count + 9 for chars
   start:

     mov ax, 07C0h          ; Set data segment to where we're loaded
     mov ds, ax
     mov es, ax     ; es is needed for stosb
     cld            ; go forwards, not backwards
   .keys:
     mov cx, SIZE   ; maximum chars in buffer
     lea di, [buffer+1]
   .again:
     mov ah,0      ; wait for any key
     int 16h       ; bios keyboard functions
     cmp al, 13    ; was the key press an 'enter' 
     je .count
     stosb         ; copy the char to the buffer
     mov ah, 0eh   ; echo the key pressed
     int 10h
     loop .again   ; loop while CX > 0
   .count:
     mov bx, SIZE  ; calculate and store char count in [buffer] 
     sub bx, cx
     mov [buffer], bl 
   .type:                ; print count and 1st character 
     call newline
     mov al, [buffer]    ; print char count (one digit) 
     add al, '0'         ; convert digit to ascii
     int 10h
     mov al, [buffer+1]  ; print 1st char of buffer
     call newline
     jmp .keys            ; keep looping! 

  newline:
     mov ah, 0eh
     mov al, 13          ; print to a newline 
     int 10h
     mov al, 10    
     int 10h
     ret

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,


  Below a difficult Gotcha!. With org 7c00h and the mov ax, 07c0h
  the code does not work.

  * get one letter from keyboard and look up in a dictionary 
  ------------------
   ;org 7c00h
     jmp start
     buffer db ' '     ; single character buffer 

   ; the dictionary, a linked list.
   aa dw 0      ; zero link means top of dictionary
     db 1,'a'  ; count + 
     mov ah, 0eh
     mov al, 'A' 
     int 10h
   bb dw aa   ; link to previous entry in dictionary
     db 1,'b' 
     mov ah, 0eh
     mov al, 'B' 
     int 10h
   cc dw bb   ; link to previous entry in dictionary
     db 1,'c' 
     mov ah, 0eh
     mov al, 'C' 
     int 10h
     ret
   last dw cc   ; link to last dictionary entry

   start:
     mov ax, 07C0h  ; Set data segment to where we're loaded
     add ax, 288      ; (4096 + 512) / 16 bytes per paragraph
     mov ss, ax
     mov sp, 4096
     mov ax, 07C0h  ; Set data segment to where we're loaded
     mov ds, ax
     mov es, ax     ; es is needed for stosb
     cld            ; go forwards, not backwards
   .again:
     mov ah,0      ; wait for any key
     int 16h       ; bios keyboard functions
     mov [buffer], al  ; copy the char to the buffer
     mov ah, 0eh   ; echo the key pressed
     int 10h
   .search:                ; print count and 1st character 
     ;call newline
     mov bx, [last]
     lea si, [bx]
     mov al, [si+3]
     mov ah, 0eh   ; echo the last character in dict 
     int 10h
     ;lea bx, newline 
     ;call bx

     jmp .again            ; keep looping! 

  newline:
     mov ah, 0eh
     mov al, 13          ; print to a newline 
     int 10h
     mov al, 10    
     int 10h
     ret

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

  The assembly coder will notice that it is easier to compare 2 
  strings if those strings are 'counted', that is, the number
  of characters they contain are stored (preferably) in front
  of the string text.

  * look up a word in a linked-list dictionary and report if found 
  ------------------
  ,,,

  * write functions which take and leave parameters from some stack
  --------
  ,,,
 
FORTH WORDS 


  Below follows some implementations of common forth words.
  However no attempt is made to adhere to any kind of 'standard'
  forth.

  I am using a convention of ; ** then ; * to indicate canonical
  implementations which can be collected together to form a 
  running system.

CRLF ....

  * print a newline
  ---------------
   ; **
   ; just print a newline
   dw 0           ; no doc
   crlf:
     dw 0         ; link
     db 4, 'crlf'
   crlf.x:
     mov ah, 0eh  ; bios type char function 
     mov al, 13   ; cr lf
     int 10h
     mov al, 10
     int 10h
     ret
   ; *
 ,,,,,
  
INBUFFER ....


  Just pushes the input buffer onto the stack

  * address of input buffer
  ---------------
    ; **
    ; puts on stack the current parse position in the input buffer
    dw 0           ; no doc
    inbuffer:
      dw input             ; link
      db 8, 'inbuffer'
    inbuffer.x:
      pop dx           ; juggle return pointer
      push inbuffer.n  ; current position
      push dx
      ret
    ; data field 
    inbuffer.n db 0, '                                  ' 
    ; *
  ,,,,

DUP ....

  One of the more fundamental forth words, just duplicates the 
  top item on the stack.

  * a dup implementation
  -----------------------
   ; **
   dw 0           ; no doc
   dup: 
     dw 0           ; link to previous word 
     db 3, 'dup'    ; strings are 'counted' 
   dup.x:
     pop bx      ; juggle fn return address
     pop ax      ; get param to duplicate
     push ax
     push ax
     push bx     ; restore fn return address
     ret
   ; *
  ,,,

DUMP ....

   Dump displays memory from a given pointer for n number of 
   bytes. It prints the hex value of the byte as well as the 
   asci value if possible. This is an important function for 
   debugging.

   The layout could be

   Address    Memory Values ....
   0x1000: FA 12 34 
            a  ^  b 
   0x1010: 23 45 56 ...  
            n  y  m

   * display the contents of memory
   ----------------

   BITS 16
   [ORG 0]

   jmp 07C0h:start     ; Goto segment 07C0

   base:
     dw 0            ; top of dictionary
     db 4, 'base'    ; forth style counted name
   base.x:
     pop dx          ; juggle return pointer for word
     push base.n     ; push address of base on stack
     push dx
     ret
   base.n dw 16

   hextable db "0123456789ABCDEF"
   dotbyte.doc:
      db 'displays a 1 byte number in current base', 13, 10
      db 'eg: 23 .byte '
      dw $-dotbyte.doc
   dotbyte:
      dw base
      db 5, '.byte'  
   dotbyte.x:
     pop dx         ; juggle the return function pointer
     pop ax         ; byte value in al to print
     push dx        ; restore the return ip
     mov bx, [base.n]   ; eg decimal, hex, any 1 < n < 17 ok 
                        ; we cannot display any base > 16 at the moment
     xor bh, bh         ; max base is 256 currently (8 bits)
     xor cx, cx     ; set counter = 0
     .again:
       xor ah, ah          ; ah = 0, ax is the dividend
       div bl              ; does ax/bl. remainder --> ah
       push ax             ; save remainder:quotient on the stack 
       inc cx              ; increment the digit counter
       cmp al, 0           ; if the quotient != 0 do the next digit 
       jne .again          ; loop while quotient > 0
     .print:
       pop ax            ; get digit from the stack
       mov al, ah        ; convert digit to ascii
       mov bx, hextable  ; translation table
       xlatb             ; replace al with hex digit from table
       mov ah, 0eH       ; print digit in al
       int 10H
       loop .print       ; using cx the digit counter to loop 
       ret

   dump.doc:
      db 'Displays the contents of memory', 13, 10
      db 'eg: 1000 20 dump   /displays 20 bytes of values from 0x1000 '
      dw $-dump.doc
   dump:
      dw dotbyte
      db 4, 'dump'  
   dump.x:
     pop dx        ; juggle return fn
     pop cx        ; how many bytes to display
     pop si        ; set si to memory pointer
     push dx       ; restore fn return

     cld               ; make lodsw go forwards 
     .nextbyte:
       xor ah, ah      ; only lower bit is printed
       lodsb           ; al := [ds:si++]
       push cx         ; save byte counter (dotbyte modifies it) 
       push ax         ; byte to display
       call dotbyte.x  ; display value of byte in current numeric base
       mov ah, 0x0E    ; type char fn
       mov al, ' '     ; a space between each byte
       int 0x10
       pop cx
       loop .nextbyte  ; do next memory byte 
     .exit:
       ret

    start:

     mov ax, cs    ; make data segment and es same as code segment
     mov ds, ax
     mov es, ax
     ;mov [base.n], word 16
     push 0      ; offset 0 from data segment
     push 20     ; show 100 bytes
     call dump.x
     here: jmp here          ; loop forever 
     times 510-($-$$) db 0   ; Pad remainder of MBR boot sector with 0s
     dw 0xAA55               ; The standard MBR boot signature

   ,,,

DOT ....

   The word '.' in forth type systems just displays the top number
   on the stack in the current base and removes that number from the 
   stack


   * implement dot for 2 byte number on stack
   ----------------

   BITS 16
   [ORG 0]

   cr equ 13   ;  carriage return
   lf equ 10   ;  form feed 

   jmp 07C0h:start     ; Goto segment 07C0

   base.doc:
      db 'Puts the address of variable base on stack', 13, 10
      db 'Base determines the current numerical base for conversions', 13, 10
      db 'eg: 16 base !  /makes the base hexadecimal  '
      dw $-base.doc
   base:
     dw 0            ; top of dictionary
     db 4, 'base'    ; forth style counted name
   base.x:
     pop dx          ; juggle return pointer for word
     push base.n     ; push address of base on stack
     push dx
     ret
   base.n dw 16

   ; There was a bug with this taking one too many values off
   ; the stack (ie the fn return pointer) and crashing the code
   ; but seems to be working now.

   hextable db "0123456789ABCDEF"
   dot.doc:
      db 'displays a 2 byte number on stack in current base', 13, 10
      db 'eg: 32 hex .   /displays 20 (32 in hexadecimal) '
      dw $-dot.doc
   dot:
      dw base
      db 1, '.'  
   dot.x:
     pop dx         ; juggle the return function pointer
     pop ax         ; 2 byte value in ax to print
     push dx        ; restore the return ip
     mov bx, [base.n]   ; eg decimal, hex, any 1 < n < 17 ok 
                        ; we cannot display any base > 16 at the moment
     xor bh, bh         ; base
     xor cx, cx     ; set counter = 0
     .again:
       xor dx, dx          ; dividend is ax
       div bx              ; does dx:ax/bx. remainder --> dx, quotient -> ax
       push dx             ; save remainder (ie digit) on the stack 
       inc cx              ; increment the digit counter
       cmp ax, 0           ; if the quotient != 0 do the next digit 
       jne .again          ; loop while quotient > 0
     .print:
       pop ax            ; get digit from the stack (digit in AL)
       mov bx, hextable  ; translation table
       xlatb             ; replace al with hex digit from table
       mov ah, 0eH       ; print digit in al
       int 10H
       loop .print       ; using cx the digit counter to loop 
       ret

    start:

     mov ax, cs    ; make data segment and es same as code segment
     mov ds, ax
     mov es, ax
    
     push 0xFF12 
     call dot.x
     push 0x1234 
     call dot.x
     mov [base.n], word 10
     push 12345 
     call dot.x

     here: jmp here          ; loop forever 
     times 510-($-$$) db 0   ; Pad remainder of MBR boot sector with 0s
     dw 0xAA55               ; The standard MBR boot signature
   ,,,


   Below is a one byte version of dot '.'

   * implement dot for one byte number on stack
   ----------------

   BITS 16
   [ORG 0]

   cr equ 13   ;  carriage return
   lf equ 13   ;  carriage return

   jmp 07C0h:start     ; Goto segment 07C0

   ; base is a standard forth word. Pushes var base (eg 16, 10)
   base.doc:
      db 'Puts the address of variable base on stack', 13, 10
      db 'eg: 16 base !  /makes the base hexadecimal  '
      dw $-base.doc
   base:
     dw 0            ; top of dictionary
     db 4, 'base'    ; forth style counted name
   base.x:
     pop dx          ; juggle return pointer for word
     push base.n     ; push address of base on stack
     push dx
     ret
   base.n dw 16 

   ; There was a bug with this taking one too many values off
   ; the stack (ie the fn return pointer) and crashing the code
   ; but seems to be working now.
   ; at the moment this is only 8 bit division. Use dx:ax for 16 bit
   ; division, with xor dx, dx; remainder->dx; quotient->ax  

   hextable db "0123456789ABCDEF"
   dotbyte.doc:
      db 'displays a 1 byte number in current base', 13, 10
      db 'eg: 23 .byte '
      dw $-dotbyte.doc
   dotbyte:
      dw base
      db 5, '.byte'  
   dotbyte.x:
     ; expects the 8 bit number to display on stack in AL and 
     ; the base in BL register
     pop dx         ; juggle the return function pointer
     pop ax         ; byte value in al to print
     push dx        ; restore the return ip
     mov bx, [base.n]   ; eg decimal, hex, any 1 < n < 17 ok 
                        ; we cannot display any base > 16 at the moment
     xor bh, bh         ; max base is 256 currently (8 bits)
     xor cx, cx     ; set counter = 0
     .again:
       xor ah, ah          ; ah = 0, ax is the dividend
       div bl              ; does ax/bl. remainder --> ah
       push ax             ; save remainder:quotient on the stack 
       inc cx              ; increment the digit counter
       cmp al, 0           ; if the quotient != 0 do the next digit 
       jne .again          ; loop while quotient > 0
     .print:
       pop ax            ; get digit from the stack
       mov al, ah        ; convert digit to ascii
       mov bx, hextable  ; translation table
       xlatb             ; replace al with hex digit from table
       mov ah, 0eH       ; print digit in al
       int 10H
       loop .print       ; using cx the digit counter to loop 
       ;push dx          ; dodgy fix if one too many off stack 
       ret
    start:

     mov ax, cs    ; make data segment and es same as code segment
     mov ds, ax
     mov es, ax
    
     push 32 
     call dotbyte.x
     push 255 
     call dotbyte.x
     mov [base.n], word 10
     push 255 
     call dotbyte.x

     here: jmp here          ; loop forever 
     times 510-($-$$) db 0   ; Pad remainder of MBR boot sector with 0s
     dw 0xAA55               ; The standard MBR boot signature
   ,,,

DOTSTACK ....

  Show stack, this is usually called .s in old forths.
  Perhaps we can use bp basepointer register to find
  out where the bottom of the stack is

  Need to debug this. It may be tricky. Stack grows down.
  Also stack is pointing to return fn first. need to jump 

  x86 stack diagram

  ss:0              sp (tos)      bos = bottom of stack

   ??   ??  ??  ??  0xFF12  0x1234 0x2222

   sp pointer to byte with value 12 in last item on stack

   sp always points to low order byte of last item on stack
   ss:0 points to stack limit. stack grows to lower memory 
   (ie small memory addresses).  

   working, just need to resolve the bx conflict below and 
   loop through each item on the stack

   * display the current contents of the stack 
   ----------------

   BITS 16
   [ORG 0]

   cr equ 13   ;  carriage return
   lf equ 10   ;  form feed 

   jmp 07C0h:start     ; Goto segment 07C0

   ; **
   base.doc:
      db 'Puts the address of variable base on stack', 13, 10
      db 'Base determines the current numerical base for conversions', 13, 10
      db 'eg: 16 base !  /makes the base hexadecimal  '
      dw $-base.doc
   base:
     dw 0            ; link top of dictionary
     db 4, 'base'    ; forth style counted name
   base.x:
     pop dx          ; juggle return pointer for word
     push base.n     ; push address of base on stack
     push dx
     ret
   base.n dw 16

   ; the bottom of the stack ie ss. This needs to be initialised when the 
   ; program starts. But the stack will also contain return pointers 
   ; for words ..
   bos:
   bos.n: dw 0
   ;hextable db "0123456789ABCDEF"
   dotstack.doc:
      db 'displays the contents of the stack without modifying.', 13, 10
      db 'in the current numeric base ', 13, 10
      db 'eg: .s   /displays all items on stack '
      dw $-dotstack.doc
   dotstack:
      dw base         ; link
      db 2, '.s'  
   dotstack.x:
     mov bx, [base.n] ; eg decimal, hex, any 1 < n < 17 ok 
                      ; we cannot display any base > 16 at the moment
     xor bh, bh       ; base
     cld              ; make lodsw go forwards 

     mov si, sp     ; set bx to top of stack
     .nextitem:
       add si, 2    ; increment the stack pointer (and avoid fn return pointer)
       cmp si, [bos.n] ; check if is the last element of the stack
       je .exit        ; 

       mov ax, [ss:si]  ; get top item on stack into ax
       xor cx, cx       ; set counter = 0
     .again:
       xor dx, dx         ; dividend is ax
       mov bx, [base.n]   ; eg decimal, hex, any 1 < n < 17 ok 
                        ; we cannot display any base > 16 at the moment
       xor bh, bh       ; base
       div bx           ; does dx:ax/bx. remainder --> dx, quotient -> ax
       push dx          ; save remainder (ie digit) on the stack 
       inc cx           ; increment the digit counter
       cmp ax, 0        ; if the quotient != 0 do the next digit 
       jne .again       ; loop while quotient > 0
     .print:
       pop ax            ; get digit from the stack (digit in AL)
       mov bx, hextable  ; translation table
       xlatb             ; replace al with hex digit from table
       mov ah, 0eH       ; print digit in al
       int 10H
       loop .print       ; using cx the digit counter to loop 
       mov al, ' '       ; print a space between each value
       int 10H

       jmp .nextitem     ; do next stack item
     .exit:
       ret
    ; * 

    start:

     mov ax, cs    ; make data segment and es same as code segment
     mov ds, ax
     mov es, ax
     mov [bos.n], sp ; 
    
     mov [base.n], word 16 
     push 0xFFFF 
     push 0xAA23 
     push 0x1234 
     call dotstack.x

     here: jmp here          ; loop forever 
     times 510-($-$$) db 0   ; Pad remainder of MBR boot sector with 0s
     dw 0xAA55               ; The standard MBR boot signature
   ,,,


DOTSTACK REVISITED ....

  An interesting thing about forth is that we can write 
  forth code even without a compiler for it!

  So, in forth the implementation of dotstack could be
  : .s tos dup bos == if exit then @word . +2 dup bos == if exit then
     @word . +2 dup bos == if exit then ....
     
    So we need a loop to implement this. Once we have these primitives
    in assembler we can call each function in turn.

  A forth equivalent of x86 lodsw
  : @word+ @word swap 2 + swap

  where tos puts the address of the top of stack (sp register in x86)
  on the stack. bos puts pointer to bottom of stack on the stack.
  @word gets 2 bytes from pointer on the top of the stack.

LOAD ....

  loads a block 1024 bytes and interprets it.

TONUM BASE HEX ....

  >num: a word that converts an ascii buffer into a number  
  .hex: prints a number in hexadecimal. This is mainly for testing

  * a forthstyle >number function to convert ascii number onto stack
  ---------
   BITS 16
   [ORG 0]
    cr equ 13   ;  carriage return
    lf equ 13   ;  carriage return

    jmp 07C0h:start     ; Goto segment 07C0

    ; the ascii number to convert
    buffer db 4, '102'

    ; example doc field with reverse count field
    ; the count field for the word doc is 2 bytes because it
    ; may contain a lot of text
    dothex.doc db 'displays a 2 byte number in hex format'
               dw $-dothex.doc
    dothex:
      dw 0       ; top of dictionary
      db 4, '.hex'
    dothex.x:
      pop bx     ; return address
      pop dx     ; the number to print (parameter on stack)
      push bx    ; restore return address
      mov ah, 0x0E ; bios teletype function 
      mov bx, hextable   ; translation table
      mov cx, 4          ; number of digits to print
      .again:
        rol dx, 4      ; rotate left 4 bits (print highest first)
        mov al, dl     ; bits to convert to hex digit
        and al, 0x0F   ; only lower 4 bits relevant
        xlatb          ; replace al with hex digit in translation table
        int 10H        ; invoke bios print function
        loop .again
      mov al, 'H'      ; print an H to indicate hex number
      mov ah, 0eH      ; echo the char (just for debugging)
      int 10H
      ret
   hextable db "0123456789ABCDEF"    ; translation table

   ; base is a standard forth word. Pushes current base (eg 16, 10)
   base:
     dw dothex
     db 4, 'base'
   base.x:
     pop dx
     push word [base.n]
     push dx
     ret
   base.n dw 10

   ; just set the base to 16 (hexadecimal)
   ; forth def..  : hex 16 base !
   hex:
     dw base
     db 3, 'hex'
   hex.x:
     mov word [base.n], 16
     ret

   ; just set the base to 10 (decimal numbers)
   ; forth def..  : base10 10 base !
   base10:
     dw hex 
     db 6, 'base10'
   base10.x:
     mov word [base.n], 10
     ret

   ; need to rethink these parameters
   ; 
   ; this would be called >number in many forths
   ; parameters, a buffer address and how many chars to convert
   ; leaves a pointer to first char unconverted
   ; (stack addr, chars - ptr char, n)
   ; this routine just assumes that the number is base 10 
   ; which it shouldn't
   num:
     dw base10         ; link to previous dict entry
     db 3, 'num'       ; counted name of function
   num.x:
     pop dx           ; juggle fn return ip
     pop cx           ; maximum chars to convert
     pop si           ; buffer address
     push dx          ; restore fn return ip
     cld              ; make lodsb step forward through chars
     ;push 0           ; initial result
   .again:
     lodsb            ; get next char into al 
     ;mov ah, 0eH     ; echo the char (just for debugging)
     ;int 10H
     cmp al, '0'      ; check for valid digit (a-c)
     jb .exit         ; if ascii value is less than '0' not digit
     cmp al, '9'
     ja .exit         ; if ascii value greater than '9' not digit
     sub ah, ah    ; set ah = 0
     sub al, '0'   ; convert digit from ascii
     push ax       ; store digit on stack
     mov ax, [num.result]
     mov bx, [base.n]   ; multiply by 10 (for decimal numbers)
     ;mov bx, 16   ; multiply by 10 (for decimal numbers)
     mul bx             ; do AX x BX and store in DX:AX 

     jo .toobig         ; result too big to store in AX 
     pop bx       ; get digit from stack
     add ax, bx
     mov [num.result], ax
     loop .again
     jmp .exit

   .toobig:  
     mov al, '!'   ; print ! if integer is too big for 2 bytes
     mov ah, 0eH   ; bios teletype function 
     int 10H       ; invoke bios
     mov word [num.result], 0x0000  ; set result := 0

   .exit:
     pop dx        ; return ip
     push si
     push word [num.result] 
     push dx
     ret
    num.result dw 0x0000    ; store intermediate results of conversion

    start:
      mov ax, cs    ; initialize the data segment register DS
      mov ds, ax
      mov es, ax
      push buffer+1
      xor ax, ax
      mov al, [buffer]
      push ax 
      call num.x
      call dothex.x

  here:    jmp here         ; loop forever 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

WORD TOIN ....

  This is a mess. should use scasb with es:di to skip leading 
  spaces of words. eg mov di, buff; mov al, ' '; repe scasb

  In this implementation of nextword, the input buffer must
  be zero terminated.

  >IN - this version of '>in' pushes onto the stack the 
  current parse position in the input text buffer.

  Word: skip over leading spaces and copy all characters to a 
   counted buffer. It is named nextword in the code below because
   'word' is a reserved word in nasm

  stack: wordbuffer, input buffer, max chars -- >in new input buffer position
  
  Code seems to be working.

  * an implementation of >in and word
  ----------------
  [ORG 0]
   jmp 07C0h:start         ; Goto segment 07C0
   
   pad db 0
       times 64 db ' '

    ; puts the address of the input buffer on the stack
    input:
      dw 0
      db 5, 'input'
    input.x:
      pop dx
      push input.buffer
      push dx
      ret
    input.buffer db 17, 'c bigg.   and. is  ', 0   ; input buffer zero terminated
                 times 64 db 0 

    ; I think it may be necessary to have a zero terminated
    ; input buffer to simplify this code.

    ; puts on stack the current parse position in the input buffer
    in:
      dw input 
      db 2, 'in'
    in.x:
      pop dx           ; juggle return pointer
      push in.pointer  ; current position
      push dx
      ret
    ; parameter fields for in, this should be initialized
    ; when the input buffer is filled
    in.pointer dw 0
    
    ; nextword will probably only work with 0 terminated strings.
    
    ; get next word from input buffer 
    ; could write a function with a delimiter parameter
    ; stack: target buffer, parse position in input -- target buffer, new parse
    nextword:
      dw in
      db 8, 'nextword'
    nextword.x:
      pop dx
      pop si     ; where to start parsing in input buffer
      pop di     ; counted buffer to write into
      push dx    ; restore fn return ip
      cld           ; make lodsb step forwards
      mov cx, 64    ; maximum target buffer
      mov bx, di    ; save counted buffer address
      inc di        ; skip count byte
      xor dx, dx    ; use dl as a char counter
    .spaces:
      ; !! no use repe scasb with es:di
      lodsb         ; get 1st char into al
      cmp al, 0     ; input buffer is zero terminated
      je .exit
      cmp al, ' '   ; skip all leading spaces
      loope .spaces 
      stosb         ; put char in al into [di]
      inc dx        ; increment char counter
    .again:
      ; !! now use repne movsb
      lodsb         ; get next char into al 
      cmp al, 0     ; input buffer is zero terminated
      je .exit
      cmp al, ' '   ; stop if space encountered
      je .exit
      stosb         ; put char in al into [di]
      inc dx          ; increment char counter
      mov ah, 0Eh     ; just for debugging type char
      ;int 10h         ; 
      loop .again

    .exit:
      mov [bx], dl   ; store char count in 1st byte of buffer
      pop dx         ; juggle return fn ip
      push bx        ; target buffer address
      push si        ; new parse position in input buffer
      push dx        ; restore
      ret

   ; the forth count word
   ; stack: addr -- addr+1, char count
   count:
     dw nextword 
     db 5, 'count'
   count.x:
     pop dx         ; preserve return fn pointer
     pop bx         ; buffer address
     xor ax, ax     ; ax := 0
     mov al, [bx]   ; get count into al
     inc bx
     push bx        ; new buffer address
     push ax        ; char count
     push dx        ; restor fn return ip
     ret

   ; stack: buffer address, char count -- 
   type:
      dw count        ; link to previous dictionary entry 
      db 4, 'type'  
   type.x:
      cld             ; make lodsb step forwards
      pop bx          ; juggle return address for call
      pop cx          ; how many chars to print
      pop si          ; address of buffer to print
      push bx         ; restore return function call
      cmp cx, 0       ; if nothing to print exit
      je .exit
      mov ah, 0eh     ; bios print character function
    .again:
      lodsb        ; get next char from message into al
      int 10h      ; x86 bios interrupt
      loop .again  ; decr cx loop counter 
    .exit:
      ret
      
   start:
      mov ax, cs       ; cs is already correct (?!) 
      mov ds, ax       ; data segment  
      mov es, ax       ; es needed for stosb 
      ;mov sp, ?       ; what about the stack pointer?

      push pad               ; where to write the word
      push input.buffer+2    ; start of input buffer
      call nextword.x
      ; here we need to update >in with the top item on the stack
      ; which is the new parse position
      pop word [in.pointer]   ; now 'pad' is left on stack
      call count.x
      call type.x
      push pad               ; do again 
      push word [in.pointer]  
      call nextword.x
      pop word [in.pointer]   ; now 'pad' is left on stack
      call count.x
      call type.x

    here:    jmp here         ; loop forever 
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature

  ,,,

EXEC ....

  We need to recode the exec example testing return addresses
  since the stack is becoming mangled.

  Maybe use BP to point to the return ip 

FIND EXEC ....
  
  Find:
    find a word by name in a dictionary and return a pointer
    to its header. This should probably use a 'last' word
    to get the last dictiony entry, rather than starting with
    a pointer on the stack, as in this implementation.
  Exec:
    given a pointer to the header of a word execute that word
    If the pointer is zero exit ??

  * find a word by name in a forth style dictionary 
  ----------------------
  [ORG 0]

   jmp 07C0h:start         ; Goto segment 07C0

    buffer db 4, 'hash' 

    ; just print a hash for testing
    hash:
      dw 0
      db 4, 'hash'
    hash.x:
      mov ah, 0Eh     ; just print a hash with bios
      mov al, '#'
      int 10h         ; x86 bios interrupt
      ret

    ; duplicate top item on stack 
    dup:
      dw hash
      db 3, 'dup'
    dup.x:
      mov ah, 0Eh     
      mov al, '*'
      int 10h         ; x86 bios interrupt
      pop ax        ; preserve fn call return ip
      pop dx
      push dx       ; duplicate top stack item
      push dx
      push ax       ; restore return ip
      ret

    ; **
    ; execute a function given a pointer to its header on the stack
    ; if pointer is zero, then this should pop the 0 and exit, no??
    exec.doc:
      db 'execute a word given a pointer on the stack'
      db ' eg: lastword exec  '
      dw $-exec.doc
    exec:
      dw dup           ; link to prev
      db 4, 'exec'
    exec.x:
      pop ax
      pop bx     ; get pointer to function
      push ax    ; preserve fn return pointer
      cmp bx, 0  ; a zero pointer should not be executed
      je .exit
      add bx, 2  ; point to name count
      mov cl, [bx]  ; get the count
      inc bx        ; skip over count
      add bl, cl    ; advance the pointer to the function

      ; !! not call [bx] thats a pointer to jumptable
      ; !!! call bx may change the stack (probably will) so we need 
      ; !!! to preserve the call return ip 
      pop word [execreturn]      ; save return ip
      call bx       ; call the fn pointed to by bx
      push word [execreturn]     ; restore fn return ip
    .exit:
      ret
    ; a dodgy solution, but any register might get overwritten
    execreturn dw 0
    
    ; point and return a pointer to the found word or else 0 on the 
    ; stack
    ; stack: search term, start pointer -- function header pointer

    find.doc:
      db 'Search dictionary for a word and return pointer.'
      db ' eg: in lastword find '
      dw $-find.doc
    find:
      dw exec         ; link to prev
      db 4, 'find'
    find.x:
      pop dx     ; juggle fn return ip
      pop bx     ; where to start searching (eg last entry in dict)
      pop ax     ; counted string buffer to search for 
      push dx    ; restore fn ip
    .again:
      xor cx, cx      ; set cx:=0
      mov si, bx      ; pointer to current function header
      add si, 2       ; the counted string is 2 bytes after header
      mov cl, [si]    ; the count of the search term
      inc cl          ; we also have to compare the count bytes
      mov di, ax      ; the search term pointer
      cld            ; search forwards (clear direction flag)
      repe cmpsb     ; compare all characters for equality
      je .found
      mov bx, [bx]    ; get the pointer to the next function (or 0)
      cmp bx, 0       ; if start of dict, then link is 0
      je .notfound    ; no more to words search, so exit
      ;push ax         ; save ax, the search term pointer
      ;mov ah, 0Eh     ; print a dot on each unsuccessful search
      ;mov al, '.'     ; for debugging
      ;int 10h        ; x86 bios interrupt
      ;pop ax          ; restore the search term pointer      
      jmp .again 
    .notfound:
      pop dx
      push 0         ; not found so return 0
      push dx
      ret
    .found: 
      pop dx         ; juggle return ip
      push bx        ; return result on stack 
      push dx
      ret
    ; *

   start:
      mov ax, cs       ; cs is already correct (?!) 
      mov ds, ax       ; data segment  
      mov es, ax       ; es needed for stosb 
      ;mov sp, ?       ; what about the stack pointer?

      ;call star.x
      ;call hash.x
      push buffer    ; word to search for
      push find
      call find.x
      call exec.x

   here:  jmp here 

   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature
  ,,,

DOCS ....

  This is a function which traverses the dictionary and prints the 
  name of the word/function and its docs if any. Forth systems did 
  not usually have docs in the word header for memory constraints
  but now, it seems ok to do it, and makes using the system more 
  pleasant to use

  * list words in the dictionary and their docs 
  ----------------------
  [ORG 0]

   jmp 07C0h:start         ; Goto segment 07C0

    dummy.doc:
      db 'The first word in the dict '
      dw $-dummy.doc
    dummy:
      dw 0
      db 5, 'dummy'
    dummy.x:
      ret

    ; this word has no document, so the doc offset is 0
    ; this means we can compile words without docs for space reasons
    dw 0
    nodoc:
      dw dummy
      db 6, 'no.doc'
    nodoc.x:
      ret

    more.doc:
      db 'About more ...'
      dw $-more.doc
    more:
      dw nodoc 
      db 4, 'more'

    moree.doc:
      db 'About moree ...'
      dw $-moree.doc
    moree:
      dw more 
      db 5, 'moree'

    stuff.doc:
      db 'This is not a real word, just testing "docs"! '
      dw $-stuff.doc
    stuff:
      dw moree
      db 5, 'stuff'
    stuff.x:
      ret

    lastword dw docs

    ; This assumes the dict has at least one word
    ; **
    docs.doc:
      db 'List all words in the dict and their docs. '
      db 'Has simple paging. Needs colours. Maybe just print up to'
      db '1st full stop since this is the summary.'
      db ' eg: docs '
      dw $-docs.doc
    docs:
      dw stuff        ; link
      db 4, 'docs'
    docs.x:
      mov bx, [lastword]
      xor dx, dx      ; use dx as a function counter
    .nextword:

      mov si, bx      ; pointer to current function header
      add si, 2       ; the counted string is 2 bytes after header
      xor ax, ax      ; ax := 0 
      lodsb           ; al := [si]++
      mov cx, ax      ; load the string count into cx for looping
      cmp cx, 0       ; if nothing to print exit
      je .exit
      mov ah, 0eh     ; bios print character function
    .nextchar:
      lodsb        ; get next char from message into al
      int 10h         ; x86 bios interrupt
      loop .nextchar  ; decr cx loop counter 
      mov al, 13      ; newline
      int 10h
      mov al, 10     
      int 10h
      mov cx, [bx-2]   ; get doc char count into loop counter
      cmp cx, 0        ; no document for this word ?
      je .continue     ; go to next word if no document here
      mov si, bx       ; get a pointer to word doc pointer
      sub si, 2       
      sub si, cx       ; start of word doc+2
      mov al, 32       ; print a space char before document
      int 10h
    .docnextchar:
      lodsb              ; get next char from message into al
      int 10h            ; x86 bios interrupt
      loop .docnextchar  ; decr cx loop counter 
      mov al, 13      ; newline
      int 10h
      mov al, 10     
      int 10h
    
    .continue:
      inc dx          ; increment counter
      test dx, 0x07   ; pause after 8 function words
      jnz .nopage     ; 
      mov al, '>'     ; 
      int 10h
      mov al, '>'     ; 
      int 10h
      mov ah, 0    ; wait for keypress bios function
      int 16h
    .nopage:
      mov bx, [bx]    ; get the pointer to the next function (or 0)
      cmp bx, 0       ; if start of dict, then link is 0
      jne .nextword 
    .exit:
      ret
    ; *

   start:
      mov ax, cs       ; cs is already correct (?!) 
      mov ds, ax       ; data segment  
      mov es, ax       ; es needed for stosb 

      call docs.x

   here:  jmp here 

   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature
  ,,,

LIST ....

  list all words by name in the dictionary. I dont think this 
  is a standard word

  The version below is much more succinct than the version which uses
  count, type etc. 

  * list all words in the dictionary, the canonical version
  ----------------------
  [ORG 0]

   jmp 07C0h:start         ; Goto segment 07C0

    dummy:
      dw 0
      db 11, 'top.of.dict'
    dummy.x:
      ret

    stuff:
      dw dummy
      db 5, 'stuff'
    stuff.x:
      ret

    lastword dw list
    ; This assumes the dict has at least one word
    ; **
    list.doc:
      db 'List all function words in the dictionary. List traverses the '
      db 'linked list dictionary and prints the name of each function '
      db 'word found in the function header. This '
      db 'leaves nothing on the stack. It relies on a lastword data '
      db 'item that contains a pointer to the last word in the dictionary '
      db '... needs paging etc '
      db ' eg: list '
      dw $-list.doc
    list:
      dw stuff       ; link 
      db 4, 'list'
    list.x:
      mov bx, [lastword]
    .nextword:
      mov si, bx      ; pointer to current function header
      add si, 2       ; the counted string is 2 bytes after header
      push bx         ; save bx (since count/type will mangle
      xor ax, ax      ; ax := 0 
      lodsb           ; al := [si]++
      mov cx, ax      ; load the string count into cx for looping
      cmp cx, 0       ; if nothing to print exit
      je .exit
      mov ah, 0eh     ; bios print character function
    .nextchar:
      lodsb        ; get next char from message into al
      int 10h         ; x86 bios interrupt
      loop .nextchar  ; decr cx loop counter 
      mov al, 32      ; space char
      int 10h
      pop bx          ; restore function header pointer
      mov bx, [bx]    ; get the pointer to the next function (or 0)
      cmp bx, 0       ; if start of dict, then link is 0
      jne .nextword 
    .exit:
      ret
    ; *

   start:
      mov ax, cs       ; cs is already correct (?!) 
      mov ds, ax       ; data segment  
      mov es, ax       ; es needed for stosb 

      call list.x

   here:  jmp here 

   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature
  ,,,

  * list all words in the dictionary, old version
  ----------------------
  [ORG 0]

   jmp 07C0h:start         ; Goto segment 07C0

    ; just print one space
    space:
      dw 0
      db 5, 'space'
    space.x:
      mov ah, 0eh  ; bios type char function 
      mov al, 32   ; space character
      int 10h
      ret

    ; the forth count word
    ; stack: addr -- addr+1, char count
    count:
      dw space
      db 5, 'count'
    count.x:
      pop dx         ; preserve return fn pointer
      pop bx         ; buffer address
      xor ax, ax     ; ax := 0
      mov al, [bx]   ; get count into al
      inc bx
      push bx        ; new buffer address
      push ax        ; char count
      push dx        ; restor fn return ip
      ret

    ; stack: buffer address, char count -- 
    type:
       dw count        ; link to previous dictionary entry 
       db 4, 'type'  
    type.x:
       cld             ; make lodsb step forwards
       pop bx          ; juggle return address for call
       pop cx          ; how many chars to print
       pop si          ; address of buffer to print
       push bx         ; restore return function call
       cmp cx, 0       ; if nothing to print exit
       je .exit
       mov ah, 0eh     ; bios print character function
     .again:
       lodsb        ; get next char from message into al
       int 10h      ; x86 bios interrupt
       loop .again  ; decr cx loop counter 
     .exit:
       ret
       
    ; just pushes a pointer to last word in dict onto the stack
    last:
      dw type
      db 4, 'last'
    last.x:
      pop ax           ; preserve fn ip return
      push word [lastword]
      push ax
      ret
    lastword dw listbefore

    ; This assumes the dict has at least one word

   list.doc:
     db 'List all function words in the dictionary. This probably '
     db 'leaves nothing on the stack. This version uses count type etc'
     db ' eg: list '
     dw $-list.doc
    list:
      dw last
      db 4, 'list'
    list.x:
      call last.x  ; get pointer to last word
      pop bx       ; where to start searching (eg last entry in dict)
    .again:
      mov si, bx      ; pointer to current function header
      add si, 2       ; the counted string is 2 bytes after header
      push bx         ; save bx (since count/type will mangle
      push si         ; the name pointer to print with count/type
      call count.x    ; count the function name
      call type.x     ; display the name
      call space.x    ; print one space
      pop bx          ; restore function header pointer
      mov bx, [bx]    ; get the pointer to the next function (or 0)
      cmp bx, 0       ; if start of dict, then link is 0
      jne .again 

      pop dx
      ;push 0          ; could return a word count here ??
      push dx
      ret

    ; listbefore searches through the dictionary starting at a 
    ; given words and lists all words before that one. 
    ; stack: last word in dict --
    listbefore:
      dw list
      db 10, 'listbefore'
    listbefore.x:
      pop dx
      pop bx      ; where to start searching (eg last entry in dict)
      push dx
    .again:
      mov si, bx      ; pointer to current function header
      add si, 2       ; the counted string is 2 bytes after header
      push bx         ; save bx (since count/type will mangle
      push si         ; the name pointer to print with count/type
      call count.x    ; count the function name
      call type.x     ; display the name
      call space.x    ; print one space
      pop bx          ; restore function header pointer
      mov bx, [bx]    ; get the pointer to the next function (or 0)
      cmp bx, 0       ; if start of dict, then link is 0
      jne .again 
      ret

   start:
      mov ax, cs       ; cs is already correct (?!) 
      mov ds, ax       ; data segment  
      mov es, ax       ; es needed for stosb 

      call list.x

   here:  jmp here 

   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature
  ,,,

ACCEPT TYPE COUNT....

  Below is a simple implementation of the standard forth words
  accept, count, and type 

  * read a line of input and store counted string in a buffer
  ------------
  [ORG 0]

   jmp 07C0h:start         ; Goto segment 07C0
   
   buffer db 0, '                                              '

   ; just print a newline
   crlf:
     dw 0
     db 4, 'crlf'
   crlf.x:
     mov ah, 0eh  ; bios type char function 
     mov al, 13   ; cr lf
     int 10h
     mov al, 10
     int 10h
     ret

   ; **
   accept.doc:
     db 'get a specified number of input characters and place them '
     db 'in the given buffer. This function should probably allow for '
     db 'line-editing (at least backspace) and is terminated by an <enter> '
     db 'keypress. '
     db ' eg: input 20 accept  '
     db '  accepts 20 typed characters from user and puts them at the '
     db '  address specified by "input". Leaves the address off the buffer '
     db '  on the stack. '
     dw $-accept.doc

   accept:  
     dw crlf         ; link 1st word has a zero link 
     db 6, 'accept'  ; forth-style function header 
   accept.x:
     pop bx       ; juggle return pointer
     pop cx       ; how many chars maximum to get 
     pop di       ; where to copy chars
     push di      ; save buffer address on stack 
     push bx      ; restore return pointer

     inc di       ; skip count byte to store 1st char
     xor dl, dl   ; char counter := zero
     cld          ; make stosb go forwards
   .again:
     mov ah, 0    ; wait for keypress bios function
     int 16h
     cmp al, 13   ; was the key press an 'enter'?
     je .exit     ; exit if enter pressed
     mov ah, 0eh  ; echo the character
     int 10h
     stosb        ; put the char into the buffer
     inc dl       ; increment char counter
     loop .again
   .exit:
     pop ax       ; return pointer
     pop bx       ; buffer address
     mov [bx], dl ; store char count in buffer

     push bx      ; restore buffer addr
     push ax      ; restore fn return pointer
     ret
   
   ; the forth count word
   ; stack: addr -- addr+1, char count
   ; !! we can improve this version of count.. eg pop si, lodsb etc
   count:
     dw accept      ; link to prev word
     db 5, 'count'
   count.x:
     pop dx         ; preserve return fn pointer
     pop bx         ; buffer address
     xor ax, ax     ; ax := 0
     mov al, [bx]   ; get count into al
     inc bx
     push bx        ; new buffer address
     push ax        ; char count
     push dx        ; restor fn return ip
     ret
   ; *

   ; **
   ; stack: buffer address, char count -- 
   dw 0           ; no doc
   type:
      dw count        ; link to previous dictionary entry 
      db 4, 'type'  
   type.x:
      cld             ; make lodsb step forwards
      pop bx          ; juggle return address for call
      pop cx          ; how many chars to print
      pop si          ; address of buffer to print
      push bx         ; restore return function call
      cmp cx, 0       ; if nothing to print exit
      je .exit
      mov ah, 0eh     ; bios print character function
    .again:
      lodsb        ; get next char from message into al
      int 10h      ; x86 bios interrupt
      loop .again  ; decr cx loop counter 
    .exit
      ret
   ; *

   start:
      mov ax, cs       ; cs is already correct (?!) 
      mov ds, ax       ; data segment  
      mov es, ax       ; es needed for stosb 
      ;mov sp, ?       ; what about the stack pointer?

   here:
      push buffer    ; where to store chars (1st byte is count)
      push 8         ; maximum number of chars to accept and store
      call accept.x  ; stack: addr
      call crlf.x
      call count.x   ; st: addr, char count
      call type.x
      call crlf.x
      jmp here 

   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature
  ,,,


INTERPRET ....

  This is my name for a forth word which was usually called 'quit'.
  It is what is called the "read evaluate print loop" cycle which
  allows a user and programmer to use the system interactively. 

  This word performs the basic functions: 
    get a line of input from keyboard/input into buffer
    copy one word at a time from input buffer to a word buffer
    see if word is a defined function, if so execute
      and print results
    if not, see if word is a number, if so push on stack
    loop again. 

  Even more powerfully, the read words can be "compiled" into
  a temporary buffer and then executed. This means that the semantic
  difference between compiling and interpreting is removed. 

  * a skeleton implementation of interpret
  ----------
    ; **

    interp.doc:
      db 'an interpreting loop for a forth style system.'
      db ' eg  '
      dw $-interp.doc
    interp:
      dw 0            ; link to prev
      db 6, 'interp'
    interp.x:
      ;pop ax
      ;pop bx     ; get pointer to function
      ;push ax    ; preserve fn return pointer

    .again: 
      call inbuffer.x    ; where to store chars (1st byte is count)
      push 40            ; maximum number of chars to accept and store
      call accept.x      ; stack: addr
      call crlf.x
      ;call dup.x
      ;call count.x   ; st: addr, char count
      ;call type.x
      push word [lastword]     ; but find can get the top of dict from last
      call find.x
      pop ax
      cmp ax, 0      ; if find returns 0, then word was not found 
      je .again 
      push ax        ; restore the word function pointer
      call exec.x    ; the bug in exec was fixed (stack mangling)
      call crlf.x
      jmp .again 
    .exit:
      ret
      ; *
  ,,,

VAR ....
  
  compiles a new variable, which is just a name which pushes
  a pointer onto the stack

CONSTANT ....

  compiles a new constant which just pushes a value onto the stack

  eg: 
    : constant here last ! (inc here) word (copy counted word to memory)
    (pop stack to get constant value)
    compile 'pop dx, push constant, push dx, ret'
    ie compile return fn juggling code
    update here
    update last

COLON ....

  The : colon word is the normal compiler of a forth like system.

  colon reads the next word from the input buffer, creates 
  a function header with a link to previous dictionary entry
  eg uses a 'last' variable. colon compiles into the space
  pointed to by the 'here' variable.

FORTH LIKE SYSTEMS

MINIMAL BOOTLOADING FORTH STYLE SYSTEM ....

  The following is an attempt to write a minimal compiling forth
  like system which will run 'standalone' (without any operating system)
  on an x86 computer. The aims are to keep the size and complexity of
  the code to a minimum while having a colon : compiler.

  This system takes ideas from forth but is not forth. It doesnt have 
  a return stack, only interprets one word at a time, doesnt compile 
  anything etc...

  The following code provides a general template for how to boot load
  a forth like system which is greater than 512 bytes in length (ie
  greater than one boot sector. A complete forth should be no bigger 
  that 8K so about 16 sectors
  
  We have to use the following between the 2 stages of the bootloader
  to make the memory offsets work...!!!
  
     "section stage2 vstart=0"  

  Another solution is to
  compile the 2 stages of the bootloader separately so that the 
  assembler nasm can work out what are the correct memory addresses
  for labels etc.

  See answer at
  https://forum.nasm.us/index.php?topic=2160.0

  Another issue to consider, is whether we are in danger of 
  loading our new code on top of the existing stack, which would 
  not be nice. 

  Working!! Most needed words are now 

  * a very minimal bootloader with some forth ideas. 
  -----------------
  
  BITS 16
  [ORG 0]

   jmp 07C0h:load    ; Goto segment 07C0
     drive db 0      ; a variable to hold boot drive number
   load:
     mov ax, cs     ; the code segment is already correct (?!)
     mov ds, ax     ; set up data and extended segments
     mov es, ax
     mov [drive], dl ; save the boot drive number
     mov ax, 07C0h   ; Set up 4K stack space after this bootloader
     add ax, 288     ; (4096 + 512) / 16 bytes per paragraph
     mov ss, ax      ; with a 4K gap between stack and code
     mov sp, 4096

      ; save the DL register or else dont modify it
      ; it contains the number of the boot medium (hard disk,
      ; usb memory stick etc)
      ; The 'floppy' Drive is NOT necesarily 0!!!

    reset:            ; Reset the floppy drive
      mov ax, 0       ; 
      mov dl, [drive] ; the boot drive number (eg for usb 128)
      int 13h         ;
      jc reset        ; ERROR => reset again
    read:
      mov ax, 1000h       ; ES:BX = 1000:0000
      mov es, ax          ; es:bx determines where data loaded to
      mov bx, 0           ;
      mov ah, 2           ; Load disk data to ES:BX
      mov al, 5           ; Load 5 sectors (only 1 used here)
      mov ch, 0           ; Cylinder=0
      mov cl, 2           ; Sector=2 (sector 1 is the boot sector)
      mov dh, 0           ; Head=0
      mov dl, [drive]     ; 
      int 13h             ; Read!
    jc read             ; ERROR => Try again

    jmp 1000h:0000      ; Jump to the loaded code 

    times 510-($-$$) db 0   ; pad out the boot sector (512 bytes)
    dw 0AA55h               ; end with standard boot signature

    ;Or just put a jmp start and jump over all the forth 
    ;definitions
    
    ; this below is the magic line to make the new memory offsets
    ; work. Or compile the 2 files separately
    ; Good answer from
    ; https://forum.nasm.us/index.php?topic=2160.0 

    section stage2 vstart=0

    ; the code to be loaded and executed
      ; cs is ok because of far jump
      ; is ds and es ok ? no, but stack seems ok
     mov ax, cs     ; the code segment is already correct (?!)
     mov ds, ax     ; set up data and extended segments
     mov es, ax

   ; the interpreting loop
   here:
      push buffer    ; where to store chars (1st byte is count)
      push 40        ; maximum number of chars to accept and store
      call accept.x  ; stack: addr
      call crlf.x
      ;call dup.x
      ;call count.x   ; st: addr, char count
      ;call type.x
      push find       ; but find can get the top of dict from last
      call find.x
      pop ax
      cmp ax, 0      ; if find returns 0, then word was not found 
      je here
      push ax        ; restore the word function pointer
      call exec.x    ; the bug in exec was fixed (stack mangling)
      call crlf.x
      jmp here 

    hang: jmp hang

    ; start of forth style words.

    ; clear the screen, but this is doing something
    ; funny to the video mode.
    cls:
      dw 0         ; top of dictionary
      db 3, 'cls'
    cls.x:
      mov ah, 0
      mov al, 13h
      int 10H
      ret

    asc:
      dw cls
      db 3, 'asc'
    asc.x:
      mov cx, 255 
    .again:
      mov ah, 0eh
      mov al, cl
      int 10H
      mov al, ' ' 
      int 10H
      loop .again
      ret

    ; get one keystroke from user and place on stack
    key:  
      dw asc  
      db 3, 'key'  ; forth-style function header 
    key.x:
      mov ah, 0    ; wait for keypress bios function
      int 16h
      pop bx       ; juggle function return pointer
      push ax      ; save keypress value on stack
      push bx      ; restore return pointer to stack
      ret

    emit:
      dw key        ; link to previous dictionary entry 
      db 4, 'emit'  
    emit.x:
      pop bx          ; juggle return address for call
      pop ax          ; character to print  (into al)
      push bx         ; restore return function call
      mov ah, 0eh     ; bios print character function
      int 10h
      ret

    ; just pushes a pointer to last word in dict onto the stack
    last:
      dw emit 
      db 4, 'last'
    last.x:
      pop ax           ; preserve fn ip return
      push word [lastword]
      push ax
      ret
    lastword dw find

    ; puts the address of the input buffer on the stack 
    in:
      dw last
      db 2, 'in'
    in.x:
      pop ax           ; preserve fn ip return
      push word buffer
      push ax
      ret
    buffer db 0
       times 80 db ' '  

    ; just print one space
    space:
      dw in 
      db 5, 'space'
    space.x:
      mov ah, 0eh  ; bios type char function 
      mov al, 32   ; space character
      int 10h
      ret

    ; -- dup just duplicates the top item on the stack
    dup: dw space       ; link to previous word 
         db 3, 'dup'    ; strings are 'counted' 
    dup.x:
       pop bx      ; juggle fn return address
       pop ax      ; get param to duplicate
       push ax
       push ax
       push bx     ; restore fn return address
       ret
    
    base.doc:
       db 'Puts the address of variable base on stack', 13, 10
       db 'Base determines the current numerical base for conversions', 13, 10
       db 'eg: 16 base !  /makes the base hexadecimal  '
       dw $-base.doc
    base:
      dw dup          ; previous word 
      db 4, 'base'    ; forth style counted name
    base.x:
      pop dx          ; juggle return pointer for word
      push base.n     ; push address of base on stack
      push dx
      ret
    base.n dw 16

   ; There was a bug with this taking one too many values off
   ; the stack (ie the fn return pointer) and crashing the code
   ; but seems to be working now.
   ; at the moment this is only 8 bit division. Use dx:ax for 16 bit
   ; division, with xor dx, dx; remainder->dx; quotient->ax  
   ; 
   ; this should be called udot or unsigned

   hextable db "0123456789ABCDEF"
   dot.doc:
      db 'displays a 2 byte number on stack in current base', 13, 10
      db 'eg: 32 hex .   /displays 20 (32 in hexadecimal) '
      dw $-dot.doc
   dot:
      dw base
      db 1, '.'  
   dot.x:
     ; expects the 16 bit number to display on stack in AX and 
     ; the base in BL register
     pop dx         ; juggle the return function pointer
     pop ax         ; 2 byte value in ax to print
     push dx        ; restore the return ip
     mov bx, [base.n]   ; eg decimal, hex, any 1 < n < 17 ok 
                        ; we cannot display any base > 16 at the moment
     xor bh, bh         ; base
     xor cx, cx     ; set counter = 0
     .again:
       xor dx, dx          ; dividend is ax
       div bx              ; does dx:ax/bx. remainder --> dx, quotient -> ax
       push dx             ; save remainder (ie digit) on the stack 
       inc cx              ; increment the digit counter
       cmp ax, 0           ; if the quotient != 0 do the next digit 
       jne .again          ; loop while quotient > 0
     .print:
       pop ax            ; get digit from the stack (digit in AL)
       mov bx, hextable  ; translation table
       xlatb             ; replace al with hex digit from table
       mov ah, 0eH       ; print digit in al
       int 10H
       loop .print       ; using cx the digit counter to loop 
       ret

   ; just print a newline
   crlf:
     dw dot 
     db 4, 'crlf'
   crlf.x:
     mov ah, 0eh  ; bios type char function 
     mov al, 13   ; cr lf
     int 10h
     mov al, 10
     int 10h
     ret

   ; get a line of input from the user 
   ; stack: buffer address, max characters -- buffer addr
   accept:  
     dw crlf         ; 1st word has a zero link 
     db 6, 'accept'  ; forth-style function header 
   accept.x:
     pop bx       ; juggle return pointer
     pop cx       ; how many chars maximum to get 
     pop di       ; where to copy chars
     push di      ; save buffer address on stack 
     push bx      ; restore return pointer
     inc di       ; skip count byte to store 1st char
     xor dl, dl   ; char counter := zero
     cld          ; make stosb go forwards
   .again:
     mov ah, 0    ; wait for keypress bios function
     int 16h
     cmp al, 13   ; was the key press an 'enter'?
     je .exit     ; exit if enter pressed
     mov ah, 0eh  ; echo the character
     int 10h
     stosb        ; put the char into the buffer
     inc dl       ; increment char counter
     loop .again
   .exit:
     pop ax       ; return pointer
     pop bx       ; buffer address
     mov [bx], dl ; store char count in buffer
     push bx      ; restore buffer addr
     push ax      ; restore fn return pointer
     ret

   ; the forth count word
   ; stack: addr -- addr+1, char count
   count:
     dw accept 
     db 5, 'count'
   count.x:
     pop dx         ; preserve return fn pointer
     pop bx         ; buffer address
     xor ax, ax     ; ax := 0
     mov al, [bx]   ; get count into al
     inc bx
     push bx        ; new buffer address
     push ax        ; char count
     push dx        ; restor fn return ip
     ret

   ; stack: buffer address, char count -- 
   type:
      dw count        ; link to previous dictionary entry 
      db 4, 'type'  
   type.x:
      cld             ; make lodsb step forwards
      pop bx          ; juggle return address for call
      pop cx          ; how many chars to print
      pop si          ; address of buffer to print
      push bx         ; restore return function call
      cmp cx, 0       ; if nothing to print exit
      je .exit
      mov ah, 0eh     ; bios print character function
    .again:
      lodsb        ; get next char from message into al
      int 10h      ; x86 bios interrupt
      loop .again  ; decr cx loop counter 
    .exit:
      ret
      

    ; execute a function given a pointer to its header on the stack
    exec:
      dw type
      db 4, 'exec'
    exec.x:
      pop ax
      pop bx     ; get pointer to function
      push ax    ; preserve fn return pointer
      add bx, 2  ; point to name count
      mov cl, [bx]  ; get the count
      inc bx        ; skip over count
      add bl, cl    ; advance the pointer to the function

      ; !! not call [bx] thats a pointer to jumptable
      ; !!! but call bx may change the stack and mangle 
      ;     any register so we need to be careful here.
      ; the solution below seems dodgy but anyway

      pop word [returnexec]      ; save return ip
      call bx       ; call the fn pointed to by bx
      push word [returnexec]     ; restore fn return ip
      ret
    ; a dodgy solution, but any register may be mangled by call bx
    returnexec dw 0

    ; list searches through the dictionary starting at a given
    ; and lists all words found. This assumes the dict has at
    ; least one word
    ; stack: --
    list:
      dw exec
      db 4, 'list'
    list.x:
      call last.x     ; get pointer to the last word in dict
      pop bx 
    .again:
      mov si, bx      ; pointer to current function header
      add si, 2       ; the counted string is 2 bytes after header
      push bx         ; save bx (since count/type will mangle
      push si         ; the name pointer to print with count/type
      call count.x    ; count the function name
      call type.x     ; display the name
      call space.x    ; print one space
      pop bx          ; restore function header pointer
      mov bx, [bx]    ; get the pointer to the next function (or 0)
      cmp bx, 0       ; if start of dict, then link is 0
      jne .again 

      pop dx
      ;push 0          ; could return a word count here ??
      push dx
      ret

    ; find searches through the dictionary starting at a given
    ; point and return a pointer to the found word or else 0 on the 
    ; stack
    ; stack: search term, start pointer -- function header pointer
    find.doc:
      db 'Search dictionary for a word and return pointer.'
      db ' eg: in find '
      dw $-find.doc
    find:
      dw list
      db 4, 'find'
    find.x:
      pop dx     ; juggle fn return ip
      pop bx     ; where to start searching (eg last entry in dict)
      pop ax     ; counted string buffer to search for 
      push dx    ; restore fn ip
    .again:
      xor cx, cx      ; set cx:=0
      mov si, bx      ; pointer to current function header
      add si, 2       ; the counted string is 2 bytes after header
      mov cl, [si]    ; the count of the search term
      inc cl          ; we also have to compare the count bytes
      mov di, ax      ; the search term pointer
      cld            ; search forwards (clear direction flag)
      repe cmpsb     ; compare all characters for equality
      je .found
      mov bx, [bx]    ; get the pointer to the next function (or 0)
      cmp bx, 0       ; if start of dict, then link is 0
      je .notfound    ; no more to words search, so exit
      ;push ax         ; save ax, the search term pointer
      ;mov ah, 0Eh     ; print a dot on each unsuccessful search
      ;mov al, '.'     ; for debugging
      ;int 10h        ; x86 bios interrupt
      ;pop ax          ; restore the search term pointer      
      jmp .again 
    .notfound:
      pop dx
      push 0         ; not found so return 0
      push dx
      ret
    .found: 
      pop dx         ; juggle return ip
      push bx        ; return result on stack 
      push dx
      ret

  ,,,

OS FORTH STYLE BOOTLOADING SYSTEM ....
  
  Provides more common forth words and moves towards a more
  capable forth system.
  Forth features are: a linked list dictionary, accepts commands
    all parameter passed on the stack.
  Unforth features: no return stack (>R <R etc), no compiling (yet)

  We have to use the following between the 2 stages of the bootloader
  to make the memory offsets work...!!!
  
     "section stage2 vstart=0"  

  Another solution is to
  compile the 2 stages of the bootloader separately so that the 
  assembler nasm can work out what are the correct memory addresses
  for labels etc.

  See answer at
  https://forum.nasm.us/index.php?topic=2160.0

  Another issue to consider, is whether we are in danger of 
  loading our new code on top of the existing stack, which would 
  not be nice. 

  Working!! Most needed words are now 
    >number - convert an ascii number and put on stack
    word - get the next word from the input buffer
    create - make a dictionary header
    : - compile a dictionary definition

  Also we can just make fun words, like colourful ascii output
  etc.

  * an extensible bootloading forth-like system for x86 realmode 
  -----------------
  BITS 16
  [ORG 0]

   jmp 07C0h:load    ; Goto segment 07C0
     drive db 0      ; a variable to hold boot drive number
   load:
     mov ax, cs     ; the code segment is already correct (?!)
     mov ds, ax     ; set up data and extended segments
     mov es, ax
     mov [drive], dl ; save the boot drive number
     mov ax, 07C0h   ; Set up 4K stack space after this bootloader
     add ax, 288     ; (4096 + 512) / 16 bytes per paragraph
     mov ss, ax      ; with a 4K gap between stack and code
     mov sp, 4096

      ; save the DL register or else dont modify it
      ; it contains the number of the boot medium (hard disk,
      ; usb memory stick etc)
      ; The 'floppy' Drive is NOT necesarily 0!!!

    reset:            ; Reset the floppy drive
      mov ax, 0       ; 
      mov dl, [drive] ; the boot drive number (eg for usb 128)
      int 13h         ;
      jc reset        ; ERROR => reset again
    read:
      mov ax, 1000h       ; ES:BX = 1000:0000
      mov es, ax          ; es:bx determines where data loaded to
      mov bx, 0           ;
      mov ah, 2           ; Load disk data to ES:BX
      mov al, 5           ; Load 5 sectors (only 1 used here)
      ; try mov cx, 0x0002 ; cylinder 0, sector 2
      mov ch, 0           ; Cylinder=0
      mov cl, 2           ; Sector=2 (sector 1 is the boot sector)
      mov dh, 0           ; Head=0
      mov dl, [drive]     ; 
      int 13h             ; Read!
    jc read             ; ERROR => Try again

    jmp 1000h:0000      ; Jump to the loaded code 

    times 510-($-$$) db 0   ; pad out the boot sector (512 bytes)
    dw 0AA55h               ; end with standard boot signature

    ;Or just put a jmp start and jump over all the forth 
    ;definitions
    
    ; this below is the magic line to make the new memory offsets
    ; work. Or compile the 2 files separately
    ; Good answer from
    ; https://forum.nasm.us/index.php?topic=2160.0 

    section stage2 vstart=0

    ; the code to be loaded and executed
      ; cs is ok because of far jump
      ; is ds and es ok ? no, but stack seems ok
     mov ax, cs     ; the code segment is already correct (?!)
     mov ds, ax     ; set up data and extended segments
     mov es, ax
     mov [bos.n], sp   ;set up bottom of stack

   ; the interpreting loop
   here:
      push in.buffer    ; where to store chars (1st byte is count)
      push 40        ; maximum number of chars to accept and store
      call accept.x  ; stack: addr
      call crlf.x
      pop bx
      push bx
      mov al, [bx]   ; check if any characters entered
      cmp al, 0
      je here
      ;call dup.x
      ;call count.x   ; st: addr, char count
      ;call type.x
      push find       ; but find can get the top of dict from last
      call find.x
      pop ax
      cmp ax, 0      ; if find returns 0, then word was not found 
      je .number
      push ax        ; restore the word function pointer
      call exec.x    ; the bug in exec was fixed (stack mangling)
      call crlf.x
      jmp here 
    .number          ; try to convert word into a number
      jmp here 

    hang: jmp hang

    ; start of forth style words.

    ; clear the screen, but this is doing something
    ; funny to the video mode.
    dw 0           ; a link to help document or else zero
    cls:
      dw 0         ; top of dictionary
      db 3, 'cls'
    cls.x:
      mov ah, 0
      mov al, 13h
      int 10H
      ret

    ; just put 50 on the stack for testing
    ; since I dont have >number yet !!!
    dw 0           ; a link to help document 
    fifty:
      dw cls 
      db 5, 'fifty'
    fifty.x:
      pop dx
      push 50
      push dx
      ret

    ; just set the base to 16 (hexadecimal)
    ; forth def..  : hex 16 base !
    hex:
      dw fifty
      db 3, 'hex'
    hex.x:
      mov word [base.n], 16
      ret

    ; just set the base to 10 (decimal numbers)
    ; forth def..  : base10 10 base !
    base10:
      dw hex 
      db 6, 'base10'
    base10.x:
      mov word [base.n], 10
      ret

    ; need to rethink these parameters
    ; 
    ; this would be called >number in many forths
    ; parameters, a buffer address and how many chars to convert
    ; leaves a pointer to first char unconverted
    ; (stack: addr, chars - ptr char, n)
    ; this routine just assumes that the number is base 10 
    ; which it shouldn't

    ; stack: buffer address - ptr char, result
    num:
      dw base10             ; link to previous dict entry
      db 3, 'num'       ; counted name of function
    num.x:
      pop dx           ; juggle fn return ip
      pop cx           ; maximum chars to convert
      pop si           ; buffer address
      push dx          ; restore fn return ip
      cld              ; make lodsb step forward through chars
      ;push 0           ; initial result
    .again:
      lodsb            ; get next char into al 
      ;mov ah, 0eH     ; echo the char (just for debugging)
      ;int 10H
      cmp al, '0'      ; check for valid digit (a-c)
      jb .exit         ; if ascii value is less than '0' not digit
      cmp al, '9'
      ja .exit         ; if ascii value greater than '9' not digit
      sub ah, ah    ; set ah = 0
      sub al, '0'   ; convert digit from ascii
      push ax       ; store digit on stack
      mov ax, [num.result]
      mov bx, [base.n]   ; multiply by 10 (for decimal numbers)
      ;mov bx, 16   ; multiply by 10 (for decimal numbers)
      mul bx             ; do AX x BX and store in DX:AX 
      jo .toobig         ; result too big to store in AX 
      pop bx       ; get digit from stack
      add ax, bx
      mov [num.result], ax
      loop .again
      jmp .exit
    .toobig:  
      mov al, '!'   ; print ! if integer is too big for 2 bytes
      mov ah, 0eH   ; bios teletype function 
      int 10H       ; invoke bios
      mov word [num.result], 0x0000  ; set result := 0
    .exit:
      pop dx        ; return ip
      push si
      push word [num.result] 
      push dx
      ret
     num.result dw 0x0000    ; store intermediate results of conversion

    dw 0           ; doc link or zero 
    asc:
      dw num
      db 3, 'asc'
    asc.x:
      mov cx, 255 
    .again:
      mov ah, 0eh
      mov al, cl
      int 10H
      mov al, ' ' 
      int 10H
      loop .again
      ret

    ; set video to big text 40x25
    vid:
      dw asc
      db 3, 'vid'
    vid.x: 
      mov ah, 0     ; set graphics display mode function.
      mov al, 1h    ; mode 0h = text 40x25 
      int 10h       ; set it!
      ret

    ; get one keystroke from user and place on stack
    key:  
      dw vid  
      db 3, 'key'  ; forth-style function header 
    key.x:
      mov ah, 0    ; wait for keypress bios function
      int 16h
      pop bx       ; juggle function return pointer
      push ax      ; save keypress value on stack
      push bx      ; restore return pointer to stack
      ret

    emit:
      dw key        ; link to previous dictionary entry 
      db 4, 'emit'  
    emit.x:
      pop bx          ; juggle return address for call
      pop ax          ; character to print  (into al)
      push bx         ; restore return function call
      mov ah, 0eh     ; bios print character function
      int 10h
      ret

    ; type text in rainbow colours
    ; (stack: text buffer addr - )
    rainbow:
      dw emit
      db 7, 'rainbow'  ; fn counted name
    rainbow.h:
      pop ax              ; balance return ip
      pop si
      push ax
      xor bx, bx          ; bx := 0 bh := 0 so no background colours
      xor cx, cx          ; set cx:=0 
      mov cl, [si]        ; the character count, used by loop and colours
      add si, cx          ; set pointer to last char in string
      std                 ; make lodsb go in reverse
    .again:
      mov ah, 09h         ; the 'function' number
      mov bl, cl          ; use the CX counter to cycle thru 16 colours
      lodsb               ; get next char to al
      int 10h             ; do it with a bios interrupt
      loop .again
      ret

    ; just pushes a pointer to last word in dict onto the stack
    last:
      dw rainbow 
      db 4, 'last'
    last.x:
      pop ax           ; preserve fn ip return
      push word [lastword]
      push ax
      ret
    lastword dw find

    
    in.doc  
      db 'puts the address of the input buffer on the stack '
      dw $-in.doc
    in:
      dw last
      db 2, 'in'
    in.x:
      pop ax           ; preserve fn ip return
      push word in.buffer
      push ax
      ret
    in.buffer db 0
       times 80 db ' '  

    ; puts the current parse position in input buffer on stack 
    ; this gets updated by the nextword fn
    toin:
      dw in
      db 3, '>in'
    toin.x:
      pop ax           ; preserve fn ip return
      push word in.buffer
      push ax
      ret
    toin.data dw 0

    pad.doc db 'puts the address of a scratch area on the stack'
               dw $-pad.doc
    pad:
      dw toin 
      db 3, 'pad'
    pad.x:
      pop ax           ; preserve fn ip return
      push word pad.buffer
      push ax
      ret
    pad.buffer  db 0
       times 80 db ' '  

    ; just print one space
    space:
      dw pad
      db 5, 'space'
    space.x:
      mov ah, 0eh  ; bios type char function 
      mov al, 32   ; space character
      int 10h
      ret

    ; -- dup just duplicates the top item on the stack
    dup: dw space       ; link to previous word 
         db 3, 'dup'    ; strings are 'counted' 
    dup.x:
       pop bx      ; juggle fn return address
       pop ax      ; get param to duplicate
       push ax
       push ax
       push bx     ; restore fn return address
       ret
    
    base.doc:
       db 'Puts the address of variable base on stack', 13, 10
       db 'Base determines the current numerical base for conversions', 13, 10
       db 'eg: 16 base !  /makes the base hexadecimal  '
       dw $-base.doc
    base:
      dw dup          ; 
      db 4, 'base'    ; forth style counted name
    base.x:
      pop dx          ; juggle return pointer for word
      push base.n     ; push address of base on stack
      push dx
      ret
    base.n dw 16

    ; the bottom of the stack ie ss. This needs to be initialised when the 
    ; program starts. But the stack will also contain return pointers 
    ; for words ..
    bos:
    bos.n: dw 0

    hextable db "0123456789ABCDEF"
    dotstack.doc:
       db 'displays the contents of the stack without modifying', 13, 10
       db 'in the current numeric base ', 13, 10
       db 'eg: .s   /displays all items on stack '
       dw $-dotstack.doc
    dotstack:
       dw base
       db 2, '.s'  
    dotstack.x:
      mov bx, [base.n]   ; eg decimal, hex, any 1 < n < 17 ok 
                         ; we cannot display any base > 16 at the moment
      xor bh, bh       ; base
      mov si, sp         ; set bx to top of stack
     .nextitem:
       add si, 2       ; increment the stack pointer (and avoid fn return pointer)
       cmp si, [bos.n] ; check if is the last element of the stack
       je .exit        ; 

       mov ax, [ss:si]  ; get top item on stack into ax
       xor cx, cx       ; set counter = 0
     .again:
       xor dx, dx          ; dividend is ax
       mov bx, [base.n]   ; eg decimal, hex, any 1 < n < 17 ok 
                        ; we cannot display any base > 16 at the moment
       xor bh, bh       ; base
       div bx              ; does dx:ax/bx. remainder --> dx, quotient -> ax
       push dx             ; save remainder (ie digit) on the stack 
       inc cx              ; increment the digit counter
       cmp ax, 0           ; if the quotient != 0 do the next digit 
       jne .again          ; loop while quotient > 0
     .print:
       pop ax            ; get digit from the stack (digit in AL)
       mov bx, hextable  ; translation table
       xlatb             ; replace al with hex digit from table
       mov ah, 0eH       ; print digit in al
       int 10H
       loop .print       ; using cx the digit counter to loop 
       mov al, ' '       ; print a space between each value
       int 10H

       jmp .nextitem     ; do next stack item
     .exit:
       ret


    ; example doc field with reverse count field
    dothex.doc db 'displays a 2 byte number in hex format'
               dw $-dothex.doc
    dothex:
      dw dotstack
      db 4, '.hex'
      ; below an example doc field
      ; db 20, 'displays a 2 byte number in hex format'
    dothex.x:
      pop bx     ; return address
      pop dx     ; the number to print (parameter on stack)
      push bx    ; restore return address
      mov ah, 0x0E ; bios teletype function 
      mov bx, hextable   ; translation table
      mov cx, 4          ; number of digits to print
      .again:
        rol dx, 4      ; rotate left 4 bits (print highest first)
        mov al, dl     ; bits to convert to hex digit
        and al, 0x0F   ; only lower 4 bits relevant
        xlatb          ; replace al with hex digit in translation table
        int 10H        ; invoke bios print function
        loop .again
      mov al, 'H'      ; print an H to indicate hex number
      mov ah, 0eH      ; echo the char (just for debugging)
      int 10H
      ret
    ;hextable db "0123456789ABCDEF"    ; translation table

    hash:
      dw dothex 
      db 4, 'hash'
    hash.x:
      mov ah, 0Eh     ; just print a star with bios
      mov al, '#'
      int 10h         ; x86 bios interrupt
      ret
    ; another fn for testing

    star:
      dw hash
      db 4, 'star'
    star.x:
      mov ah, 0Eh     ; just print a star with bios
      mov al, '*'
      int 10h         ; x86 bios interrupt
      ret

   ; just print a newline
   crlf:
     dw star
     db 4, 'crlf'
   crlf.x:
     mov ah, 0eh  ; bios type char function 
     mov al, 13   ; cr lf
     int 10h
     mov al, 10
     int 10h
     ret

   ; get a line of input from the user 
   ; stack: buffer address, max characters -- buffer addr
   accept:  
     dw crlf         ; 1st word has a zero link 
     db 6, 'accept'  ; forth-style function header 
   accept.x:
     pop bx       ; juggle return pointer
     pop cx       ; how many chars maximum to get 
     pop di       ; where to copy chars
     push di      ; save buffer address on stack 
     push bx      ; restore return pointer
     inc di       ; skip count byte to store 1st char
     xor dl, dl   ; char counter := zero
     cld          ; make stosb go forwards
   .again:
     mov ah, 0    ; wait for keypress bios function
     int 16h
     cmp al, 13   ; was the key press an 'enter'?
     je .exit     ; exit if enter pressed
     mov ah, 0eh  ; echo the character
     int 10h
     stosb        ; put the char into the buffer
     inc dl       ; increment char counter
     loop .again
   .exit:
     pop ax       ; return pointer
     pop bx       ; buffer address
     mov [bx], dl ; store char count in buffer
     push bx      ; restore buffer addr
     push ax      ; restore fn return pointer
     ret

   ; the forth count word
   ; stack: addr -- addr+1, char count
   count:
     dw accept 
     db 5, 'count'
   count.x:
     pop dx         ; preserve return fn pointer
     pop bx         ; buffer address
     xor ax, ax     ; ax := 0
     mov al, [bx]   ; get count into al
     inc bx
     push bx        ; new buffer address (+1)
     push ax        ; char count
     push dx        ; restor fn return ip
     ret

   ; stack: buffer address, char count -- 
   type:
      dw count        ; link to previous dictionary entry 
      db 4, 'type'  
   type.x:
      cld             ; make lodsb step forwards
      pop bx          ; juggle return address for call
      pop cx          ; how many chars to print
      pop si          ; address of buffer to print
      push bx         ; restore return function call
      cmp cx, 0       ; if nothing to print exit
      je .exit
      mov ah, 0eh     ; bios print character function
    .again:
      lodsb        ; get next char from message into al
      int 10h      ; x86 bios interrupt
      loop .again  ; decr cx loop counter 
    .exit:
      ret
      

    ; execute a function given a pointer to its header on the stack
    exec:
      dw type
      db 4, 'exec'
    exec.x:
      pop ax
      pop bx     ; get pointer to function
      push ax    ; preserve fn return pointer
      add bx, 2  ; point to name count
      mov cl, [bx]  ; get the count
      inc bx        ; skip over count
      add bl, cl    ; advance the pointer to the function

      ; !! not call [bx] thats a pointer to jumptable
      ; !!! but call bx may change the stack and mangle 
      ;     any register so we need to be careful here.
      ; the solution below seems dodgy but anyway

      pop word [returnexec]      ; save return ip
      call bx       ; call the fn pointed to by bx
      push word [returnexec]     ; restore fn return ip
      ret
    ; a dodgy solution, but any register may be mangled by call bx
    returnexec dw 0

    ; list searches through the dictionary starting at a given
    ; and lists all words found. This assumes the dict has at
    ; least one word
    ; stack: --
    list:
      dw exec
      db 4, 'list'
    list.x:
      call last.x     ; get pointer to the last word in dict
      pop bx 
    .again:
      mov si, bx      ; pointer to current function header
      add si, 2       ; the counted string is 2 bytes after header
      push bx         ; save bx (since count/type will mangle
      push si         ; the name pointer to print with count/type
      call count.x    ; count the function name
      call type.x     ; display the name
      call space.x    ; print one space
      pop bx          ; restore function header pointer
      mov bx, [bx]    ; get the pointer to the next function (or 0)
      cmp bx, 0       ; if start of dict, then link is 0
      jne .again 

      pop dx
      ;push 0          ; could return a word count here ??
      push dx
      ret

    ; find searches through the dictionary starting at a given
    ; point and return a pointer to the found word or else 0 on the 
    ; stack
    ; stack: search term, start pointer -- function header pointer
    find:
      dw list
      db 4, 'find'
    find.x:
      pop dx     ; juggle fn return ip
      pop bx     ; where to start searching (eg last entry in dict)
      pop ax     ; counted string buffer to search for 
      push dx    ; restore fn ip
    .again:
      xor cx, cx      ; set cx:=0
      mov si, bx      ; pointer to current function header
      add si, 2       ; the counted string is 2 bytes after header
      mov cl, [si]    ; the count of the search term
      inc cl          ; we also have to compare the count bytes
      mov di, ax      ; the search term pointer
      cld            ; search forwards (clear direction flag)
      repe cmpsb     ; compare all characters for equality
      je .found
      mov bx, [bx]    ; get the pointer to the next function (or 0)
      cmp bx, 0       ; if start of dict, then link is 0
      je .notfound    ; no more to words search, so exit
      ;push ax         ; save ax, the search term pointer
      ;mov ah, 0Eh     ; print a dot on each unsuccessful search
      ;mov al, '.'     ; for debugging
      ;int 10h        ; x86 bios interrupt
      ;pop ax          ; restore the search term pointer      
      jmp .again 
    .notfound:
      pop dx
      push 0         ; not found so return 0
      push dx
      ret
    .found: 
      pop dx         ; juggle return ip
      push bx        ; return result on stack 
      push dx
      ret
    

  ,,,

READING TEXT FROM KEYBOARD 

  Below is a forth style program, in traditional forths it would be
  called "expect" but this version has no line editing. This is working
  except for the char count at the beginning of the buffer. The next
  step is a 'find' function which looks up a word in the dict and
  executes it.

  * read a line of input and store in a buffer
  ------------
   jmp start

   buffer db 0, '             '
   ; just print a newline
   crlf:
     dw 0
     db 4, 'crlf'
   crlf.x:
     mov ah, 0eh  ; bios type char function 
     mov al, 13   ; cr lf
     int 10h
     mov al, 10
     int 10h
     ret

   ; get a line of input from the user, this is called 'accept'
   ; in most forths
   ; stack parameters: buffer address, max characters
   line:  
     dw 0          ; 1st word has a zero link 
     db 4, 'line'  ; forth-style function header 
   line.x:
     pop bx       ; juggle return pointer
     pop cx       ; how many chars maximum to get 
     pop ax       ; where to copy chars
     push bx      ; restore return pointer
     inc ax       ; first byte of buffer is char count
     mov di, ax   ; where stosb will put characters 
     sub dl, dl   ; simple char counter
   .again:
     mov ah, 0    ; wait for keypress bios function
     int 16h
     cmp al, 13   ; was the key press an 'enter' 
     je .exit     ; exit if enter pressed
     mov ah, 0eh  ; echo the character
     int 10h
     stosb        ; put the char into the buffer
     inc dl       ; increment char counter
     loop .again
   .exit:
     ; have to save
     ;mov buffer, dl  ; store char count in buffer
     ret

   ; type takes its arguments on the stack (buffer address, char count)
   type:
      dw line        ; link to previous dictionary entry 
      db 5, 'type'  
   type.x:
      cld             ; set dir flag to forwards
      pop bx          ; juggle return address for call
      pop cx          ; how many chars to print
      pop ax          ; address of buffer to print
      push bx         ; restore return function call
      mov si, ax      ; maybe should use "lea si, ax" but how?? 
      mov ah, 0eh     ; bios print character function
    .again:
      lodsb  ; get next char from message 
      int 10h
      loop .again
      ret
      
   start:
      mov ax, 07C0h      ; Set data segment to where we're loaded
      mov ds, ax     
      mov es, ax         ; es needed for stosb 
      ;mov sp, ?         ; what about the stack pointer?

   here:
      push buffer    ; where to store chars
      push 8         ; maximum number of chars to store
      call line.x
      call crlf.x
      push buffer+1 ; buffer to print (but not count)
      push 8        ; chars to print
      call type.x
      call crlf.x
      jmp here 

   times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
   dw 0xAA55               ; The standard PC boot signature
  ,,,

  * read a line of input and count the characters 
  ---------------------------------------------------------
  jmp start
  start:
  mov ax, 07C0h    ; Initialize data segment register 
  mov ds, ax       ; via AX 
  sub cx, cx    ; set cx = 0
  .again
     mov ah, 0     ; bios read character function
     int 16h       ; invoke bios interrupt
     mov ah, 0eH   ; echo char entered
     int 10H
     inc cx        ; increment the character counter
     cmp al, 13    ; was the key press a 'enter' 
    jne .again     ; loop if not enter pressed
    mov al, 10     ; print a newline
    mov ah, 0eH
    int 10H
    dec cx        ; dont include newline in character count
    mov ax, cx    ;  
    add al, '0'   ; convert count digit to ascii
    mov ah, 0eH   ; x86 bios print char function
    int 10h       ; print first digit of character count

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

  Todo!!

  * read a line of input, and copy counted string to buffer 
  ---------------------------------------------------------
  jmp start
  buffer db '                      '
  start:
  mov ax, 07C0h    ; Set data segment to where we're loaded
  mov ds, ax

  mov di, buffer+2 ; the first word of buffer is for the count

  sub cx, cx    ; set cx = 0
  .again
    mov ah, 0     ; x86 bios get char from keyboard
    int 16h       ; invoke the bios
    mov ah, 0eH   ; print char
    int 10H       ; invoke bios function

    ; NOTE: I think the code below could be better done with
    ;  stosb, and loopne
    mov [di], al  ; copy AL to buffer
    inc di
    inc cx        ; increment the character counter
    cmp al, 13    ; was the key press a 'enter' 
    jne .again    ; loop if not enter pressed
    mov al, 10    ; print a newline
    mov ah, 0eH
    int 10H
    dec cx        ; dont include newline in character count
    mov ax, cx    ;  
    add al, '0'   ; convert count digit to ascii
    mov ah, 0eH   ; x86 bios print char function
    int 10h       ; print first digit of character count
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

 
NUMBERS AS USER INPUT ....

  We would like to be able to read a positive integer
  entered by the user. So we need to read each digit,
  then multiply it by the base and add the next digit,
  and so on.


MOUSE INPUT 

  http://wiki.osdev.org/Mouse_Input
    just enough info

  The good news is that a modern usb mouse emulates or behaves
  like a normal PS/2 mouse. So you dont have to actually write
  a usb 'stack' (software api) in order to use the mouse. Phew!

  Mouse (and Keyboard) data come on port 0x60
  port 0x64 bit 1 - data is available, bit 5 - data is from
  mouse, not keyboard.

ASCII CODE

  The ascii code is a way of mapping common western (latin)
  characters to numbers
 
  * some common useful ascii codes
  -----------
    cr      equ  13         ; carriage return
    lf      equ  10         ; line feed
    bell    equ   7         ; bell (sort of)
    spc     equ  32         ; space
    bs      equ   8         ; back space
    del     equ 127         ; 'delete' character
  ,,,

  * print ascii in descending order
  --------------------
  jmp start
  start:
  mov cx, 255 
  .again:
    mov ah, 0eh
    mov al, cl
    int 10H
    mov al, ' ' 
    int 10H
    loop .again
  .exit:
    hang: jmp hang        ; keep looping! 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

  * print the first 128 ascii in ascending order
  --------------------
  jmp start
  start:
  mov cx, 128 
  .again:
    mov ah, 0eh    ; bios teletype function
    mov al, 128
    sub al, cl
    int 10H        ; call bios function
    mov al, ' '    ; print a space
    int 10H        ; do it
    loop .again    ; loop 128 times
  .exit:
    hang: jmp hang        ; keep looping! 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

  The code below is designed to use direct memory writes

  * print a coloured ascii table with values in hex
  -------------
    [org 0]
    jmp 07C0h:start

    ; **
    hextable db "0123456789ABCDEF"    ; digit translation table
    dw 0           ; no doc
    asctable:
      dw 0               ; link
      db 8, 'asctable'
    asctable.x:
      mov cx, 0x00FF     ; loop through all asci chars
      mov bx, hextable   ; pointer to digit translation table
      mov ax, es
      mov fs, ax         ; save es pointer
      mov ax, 0xB800     ; video memory address 
      mov es, ax         ; set up es for stosw 
      mov di, 320        ; start at 3rd line (80 chars/ line, 2 bytes/char)

    .nextchar:
      mov dx, 0x00FF      ; loop through all asci chars
      sub dx, cx          ; might be better to use incrementing counter

      mov ah, 0x0E
      mov al, dl     ; high nibble of ascii code to print 
      rol al, 4      ; print first digit
      and al, 0x0F   ; print high byte first
      xlatb          ; replace al with hex digit  al := [bx+al]
      ; put colour in ah for direct memory write
      mov ah, 0b00000010 ; green on black 
      stosw

      mov al, dl     ; lower nibble of ascii code 
      and al, 0x0F   ; print high byte first
      xlatb          ; replace al with hex digit  al := [bx+al]
      mov ah, 0b00000010 ; green on black 
      stosw

      mov ax, 0x0020   ; a black on black space
      stosw
      cmp dl, 0x0D     ; dont print a formfeed
      je .skip
      mov al, dl       ; print actual asci char
      mov ah, 0b00001111 ; white on black 
      stosw
   .skip:
      mov ax, 0x0020   ; a black on black space
      stosw
      ; newline calculation is a bit trickier with direct memory display
      ;test dl, 0x0F
      ;jmpne .end
   .end:
      loop .nextchar
      mov ax, fs
      mov es, ax       ; restore es pointer 
      ret
    ; *

    start:
      mov ax, cs
      mov ds, ax
      mov es, ax      ; stosw uses es segment reg
      ;call asctable.x
      mov bx, asctable.x   ; check for strange bugs
      call bx
      jmp $

    times 510-($-$$) db 0   ; Pad boot sector with 0s
    dw 0xAA55               ; MBR boot signature

  ,,,,

  * print a monochrome ascii table with values in hex
  -------------
    [org 0]
    jmp 07C0h:start

    hextable db "0123456789ABCDEF"    ; digit translation table
    asctablebw:
      dw 0
      db 8, 'asctable'
    asctablebw.x:
      mov cx, 0x00FF     ; loop through all asci chars
      mov bx, hextable   ; pointer to digit translation table
    .nextchar:
      mov dx, 0x00FF      ; loop through all asci chars
      sub dx, cx          ; might be better to use incrementing counter

      mov ah, 0x0E   ; x86 int 0x10 type char function
      mov al, dl     ; high nibble of ascii code to print 
      rol al, 4      ; print first digit
      and al, 0x0F   ; print high byte first
      xlatb          ; replace al with hex digit  al := [bx+al]
      int 10h        ; invoke bios 

      mov al, dl     ; lower nibble of ascii code 
      and al, 0x0F   ; print high byte first
      xlatb          ; replace al with hex digit  al := [bx+al]
      int 10h        ; invoke bios 

      mov ah, 0x0E     ; separate with a space 
      mov al, ' ' 
      int 10h
      cmp dl, 0x0D     ; dont print a formfeed
      je .skip
      mov al, dl       ; print actual asci char
      int 10h          ; x86 bios interrupt
   .skip:
      mov al, ' ' 
      int 10h

      test cl, 0b00000111   ; modulus 8, 
      jne .page         ; 

      mov ah, 0x0E      ; print a new line every 8 chars
      mov al, 13
      int 10h
      mov al, 10
      int 10h

   .page:
      cmp cl, 0x7F      ; page chars at 128th char 
      jne .end
      mov ah,0         ; wait for any key:
      int 16h     

   .end:       
      loop .nextchar
      ret

    start:
      mov ax, cs
      mov ds, ax
      mov es, ax      ; stosw uses es segment reg
      call asctablebw.x
      jmp $

    times 510-($-$$) db 0   ; Pad boot sector with 0s
    dw 0xAA55               ; MBR boot signature

  ,,,,

  * print ascii characters (and some non-ascii) in a table 
  ---------------------------------------------------------
  jmp start
  %include 'printi8.asm'
  start:
  mov ax, 07C0h    ; Set up 4K stack space after this bootloader
  add ax, 288      ; (4096 + 512) / 16 bytes per paragraph
  mov ss, ax
  mov sp, 4096
  mov ax, 07C0h    ; Set data segment to where we're loaded
  mov ds, ax

  mov cx, 0 
  .again:
    mov al, cl
    mov bl, 10 
    call printi8    ; print value of ascii character
    mov ah, 0eH
    mov al, ':'     ; print a separator character
    int 10H
    mov al, cl 
    int 10H
    mov al, ' ' 
    int 10H
    inc cx
    cmp cx, 0xFF
    je .exit
    mov ax, cx
    and ax, 0007h    ; mod 8
    cmp ax, 0
    jne .again
    mov ah, 0eh
    mov al, 13
    int 10h
    mov ah, 0eh
    mov al, 10
    int 10h
    jmp .again
  .exit:
    hang: jmp hang        ; keep looping! 
  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
  ,,,

FONTS AND TEXT

  http://wiki.osdev.org/VGA_Fonts
    good info

  A 'standard' x86 bios outputs text in one of its 
  'text video modes'. Its uses font 'bitmaps' with 8 columns
  and 16 rows for each character.Within the bitmap a 0 represents
  the background colour, and a 1 represents the foreground colour.
  The first row of the glyph (character) - 8 bits or 1 byte is
  contained in the first byte of the bitmap, the 2nd row in the
  second byte... etc. 

  In the various graphics video modes, there are no BIOS 
  functions for writing a character to the screen. The programmer
  must provide this functionality, by writing a character 1 pixel
  at a time.

  The first step to writing characters in graphics
  mode is getting the bitmap fonts data matrix...

  Standard vga fonts are 8x16 pixels. Each byte contains 1 row of the
  given character- the first byte is the top row etc. 1=foreground,
  0=background So character takes up 16 bytes.

  The bios contains tables of information 
  laying out the fonts used to display characters.

  Int 10h service AH=11h, subservice AL=30h

  * get the table of font information, untested
  -------------
    mov ax, 1130h ; (Get font information) 
    mov bh, 06h   ; 8x16 font (vga/mcga) 
    int 10h       ; leave font table pointer in ES:BP 
  ,,,
 
  Next step: access a character and display it pixel by 
  pixel in some graphics mode.

  To access 1 character from the 4K (4096byte) character
  bitmap multiply ascii code by 16 (bytes per character) and 
  add to the ES:BP register 

  * code to store the full 4K (256 chars x 16 bytes/char) bitmap
  -------------------
   ;in: es:di=4k buffer
   ;out: buffer filled with font
   push ds
   push es
   ;ask BIOS to return VGA bitmap fonts in es:bp
   mov ax, 1130h
   mov bh, 6
   int 10h
   ;copy charmap
   push es   ; make the extended segment and 
   pop ds    ; the data segment the same 
   pop es    ;
   mov si, bp
   mov cx, 256*16/4
   rep movsd
   pop ds
  ,,,

  * display an ascii character by copying a bios font bitmap 
  -------------
   jmp start
   %include 'printi8.asm'
   char db '$'
   start:
    mov ax, 07C0h
    mov ds, ax    ; set the data segment

    mov ax, 1130h ; (Get font information) 
    mov bh, 06h   ; 8x16 font (vga/mcga) 
    int 10h       ; leave font table pointer in ES:BP 
    mov al, [char]
    mov bl, 10
    call printi8
    ; left shift char 4 times (to multiply by 16)
    ; eg
    ; sub ah, ah   mov cx, 4  shr ax, 4
    ; add this to the bitmap offset
    ; add bp, ax
    ; now just print out the bytes or draw pixel by
    hang: jmp hang
    times 510-($-$$) db 0   
    dw 0xAA55              
  ,,,
 
  Make a function bigchar which has stack parameters cursorx:y, colour,
  character and return cursorx:y or next character. This will allow us to print
  a big clock with cmos rtc easily. 
  
  The code below enters a loop an displays ascii characters in big
  blocky format. This code could be simplified by writing directly
  to video memory, instead of using x86 bios interrupts, which are 
  slow and clunky.

  The "show" code has a great deal of redundant rubbish in it

  * get a bios font glyph from memory and push pointer on stack 
  -------------
   [ORG 0]
   jmp 07C0h:start        ; start label in segment 07C0

  ; **

    block equ 0xFE    ; ascii code for small block
    bigblock equ 219  ; ascii code for big block
    alpha equ 224   ; Greek letter alpha 
    beta  equ 225   ; Greek letter beta
    gamma equ 226   ; Greek letter gamma
    char db '*'
    ;char db [gamma]

  ; (stack: char -- segment address, pointer -> bios glyph font address )
  getglyph.doc:
    db 'gets the font map pointer to a bios glyph', 13, 10
    db 'eg: 65 getglyph show'
    db 'stack: char -- seg address, offset to glyph'
    dw $-getglyph.doc
  getglyph:
    dw 0               ; link to previous dict word or null
    db 8, 'getglyph'   ; forth counted name
  getglyph.x:
    mov ax, 1130h ; (Get font information) 
    mov bh, 06h   ; 8x16 font (vga/mcga) 
    int 10h       ; leave font table pointer in ES:BP 
    pop dx        ; balance return fn
    pop ax        ; character -> al
    push dx
    xor ah, ah    ; set ax := al
    shl ax, 4     ; set ax := ax * 16 (16 bytes per character)
    add bp, ax    ; add char offset pointer to font map 
    pop dx       ; balance return IP
    push es      ; segment address of glyph, Yes need to do this
    push bp      ; put pointer to glyph on stack
    push dx 
    ret

  ; (stack: segment address, glyph address - )
  ; the code below modifies es segment register which doesnt
  ; seem a good idea, and could lead to some tricky bugs
  show.render db ' ', block
  show.doc:
    db 'displays an 8x16 glyph in text mode', 13, 10
    db 'eg: 65 getglyph show'
    db ' [stack: segment address, glyph address -- ] '
    dw $-show.doc
  show:
    dw getglyph    ; link to previous dict word or null
    db 4, 'show'   ; forth counted name
  show.x:
    pop dx        ; balance return ip
    pop si        ; pointer to 1st byte of glyph
    mov ax, ds    ; save ds in es
    mov fs, ax    ; could save in fs, gs etc... better
    pop ds        ; get segment address in DS register
    push dx
    xor bx, bx          ; bx := 0 bh := 0 so no background colours
    mov cx, 16          ; 16 bytes in the glyph (8 x 16 pixels)
    cld                 ; make lodsb go forwards ("clear direction flag")
  .nextrow:
    lodsb               ; get next byte (glyph row) to al
    mov bl, al          ; save glyph row to bl
    push cx             ; save current row number
    mov cx, 8           ; loop through the 8 bits
  .nextbit:
    test bl, 0b10000000  ; check if bit is set
    jne .fill            ; 
    mov al, ' '          ; space character
    jmp .next
  .fill:
    mov al, block       ; fill character
  .next:
    mov ah, 0eh         ; bios type char function 
    int 10h             ; x86 bios int 
    rol bl, 1
    loop .nextbit

    mov ah, 0eh       ; bios type char function 
    mov al, 13        ; <return> char
    int 10h           ; 
    mov al, 10        ; <form feed> 
    int 10h           ; 
    pop cx            ; restore current row number 
    loop .nextrow
    mov ax, fs    ; restore ds from es
    mov ds, ax    ; 
    ret

  glyphpress.doc:
    db 'Displays bigs glyphs of key presses'
    dw $-glyphpress.doc
  glyphpress:
    dw show               ; link to previous dict word or null
    db 10, 'glyphpress'   ; forth counted name
  glyphpress.x:

  .loop:
    xor ax, ax     ; ax := 0
    mov ah, 0      ; wait for keypress bios function
    int 16h        ; key value -> reg: al
    cmp al, ' '    ; exit loop if space is pressed
    je .exit
    push ax        ; push the character on the stack (value in al, ah==0)
    ; mov ah, 0eh  ; bios type char function 
    ; int 10h      ; 
    call getglyph.x 
    call show.x
    jmp .loop

  .exit:
    ret
  ; *

  start:
    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax

    call glyphpress.x
    jmp $          ; loop forever
    times 510-($-$$) db 0   
    dw 0xAA55              
  ,,,
 

  * a big char 
  -------------
   [ORG 0]
   jmp 07C0h:start        ; start label in segment 07C0

    block equ 0xFE    ; ascii code for small block
    bigblock equ 219  ; ascii code for big block
    alpha equ 224   ; Greek letter alpha 
    beta  equ 225   ; Greek letter beta
    gamma equ 226   ; Greek letter gamma
    ;char db '*'
    char db alpha 
    ;char db [gamma]

  ; (stack: char -- segment address, pointer -> bios glyph font address )
  getglyph.doc:
    db 'gets the font map pointer to a bios glyph', 13, 10
    db 'eg: 65 getglyph show'
    db 'stack: char --> seg address, offset to glyph'
    dw $-getglyph.doc
  getglyph:
    dw 0               ; link to previous dict word or null
    db 8, 'getglyph'   ; forth counted name
  getglyph.x:

    mov ax, 1130h ; (Get font information) 
    mov bh, 06h   ; 8x16 font (vga/mcga) 
    int 10h       ; leave font table pointer in ES:BP 
    pop dx        ; balance return fn
    pop ax        ; character -> al
    push dx
    xor ah, ah    ; set ax := al
    shl ax, 4     ; set ax := ax * 16 (16 bytes per character)
    add bp, ax    ; add char offset pointer to font map 

    pop dx       ; balance return IP
    push es      ; segment address of glyph, Yes need to do this
    push bp      ; put pointer to glyph on stack
    push dx 
    ret

  ; (stack: segment address, glyph address - )
  ; the code below modifies es segment register which doesnt
  ; seem a good idea, and could lead to some tricky bugs

  ; 
  bigchar.doc:
    db 'displays an 8x16 glyph in text mode', 13, 10
    db 'eg: 65 getglyph show'
    db ' [stack: segment address, glyph address --> ] '
    db 'should be [stack: startx:y, segment address, glyph address -- endx:y ] '
    db 'or ??? [stack: startx:y, colour, char, nextx:y --> ] '
    dw $-bigchar.doc
  bigchar:
    dw getglyph       ; link to previous dict word or null
    db 7, 'bigchar'   ; forth counted name
  bigchar.x:
    pop bx        ; balance return ip
    pop si        ; pointer to 1st byte of glyph
    mov ax, ds    ; save ds in es
    mov fs, ax    ; 
    pop ds        ; get segment address in DS register
    push bx
    xor bx, bx          ; bx := 0 bh := 0 so no background colours
    mov cx, 16          ; 16 bytes in the glyph
    cld                 ; make lodsb go forwards ("clear direction flag")
  .nextrow:
    lodsb               ; get next byte (glyph row) to al
    mov bl, al          ; save glyph row to bl
    push cx             ; save current row number
    mov cx, 8           ; print the 8 bits
  .nextbit:
    mov dl, bl          ; 
    test dl, 0b10000000  ; check if bit set
    jnz .fill
    mov ah, 0eh         ; bios type char function 
    mov al, ' '         ; fill character
    int 10h             ; 
    jmp .next
  .fill:
    mov ah, 0eh         ; bios type char function 
    mov al, bigblock    ; fill character
    int 10h             ; 
  .next:
    rol bl, 1
    loop .nextbit

    mov ah, 0eh       ; bios type char function 
    mov al, 13        ; <return> char
    int 10h           ; 
    mov al, 10        ; <form feed> 
    int 10h           ; 
    pop cx            ; restore current row number 
    loop .nextrow
    mov ax, fs    ; restore ds from es
    mov ds, ax    ; 
    ret

  start:
    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax
    xor ax, ax
    push 0x0909     ; initial cursor position, not used yet...
    mov al, [char]  ; push the big character to print on the stack
    push ax
    call getglyph.x 
    call bigchar.x
    jmp $          ; loop forever

    times 510-($-$$) db 0   
    dw 0xAA55              
  ,,,
 
CUSTOM FONTS

  You can modify a glyph for a character in text mode
  just by writing the desired bit map to the correct location
  in memory. The code is identical to reading the 4K font 
  bit map but change the direction of the MOV instruction.

  Another way to set fonts used for text mode
  Ralph Brown Interrupt list Int 10/AX=1110h.

   The following displays a 8x16 glyph given a pointer to
   its 1st byte on the stack. The code is written as a forth-style
   linked list dictionary. This code can be used display glyphs
   from bios memory (standard text mode ascii characters).

   Maybe need to have a segment address on the stack as well?
   (since ax=1130h, int 10h returns address in es:bp)

   * display a 8x16 pixel glyph
   -----------------
   [ORG 0]
   jmp 07C0h:start        ; start label in segment 07C0
   
   glyph db 100111001
         db 00000000b
         db 01111111b
         db 01100011b
         db 01100011b
         db 01100011b
         db 01111111b
         db 01100011b
         db 01100011b
         db 01100011b
         db 01100011b
         db 01100011b
         db 01100011b
         db 00000000b
         db 00000000b
         db 11111111b

  block equ 0xFE    ; ascii code for small block
  ;block equ 219    ; ascii code for big block

  ; (stack: glyph address - )
  show:
    dw 0           ; link to previous dict word or null
    db 4, 'show'   ; forth counted name
  show.x:
    pop dx        ; balance return ip
    pop si        ; pointer to 1st byte of glyph
    push dx
    xor bx, bx          ; bx := 0 bh := 0 so no background colours
    mov cx, 16          ; 16 bytes in the glyph
    cld                 ; make lodsb go forwards ("clear direction flag")
  .nextrow:
    lodsb               ; get next byte (glyph row) to al
    mov bl, al          ; save glyph row to bl
    ;mov ah, 0eh         ; bios type char function 
    ;int 10h             ; do it with a bios interrupt
    push cx             ; save current row number
    mov cx, 8           ; print the 8 bits
  .nextbit:
    mov dl, bl          ; 
    and dl, 0b10000000  ; check if bit set
    cmp dl, 0
    jne .fill
    mov ah, 0eh         ; bios type char function 
    mov al, ' '         ; fill character
    int 10h             ; 
    jmp .next
  .fill:
    mov ah, 0eh         ; bios type char function 
    mov al, block       ; fill character
    int 10h             ; 
  .next:
    rol bl, 1
    loop .nextbit

    mov ah, 0eh       ; bios type char function 
    mov al, 13        ; <return> char
    int 10h           ; 
    mov al, 10        ; <form feed> 
    int 10h           ; 
    pop cx            ; restore current row number 
    loop .nextrow
    ret

  start:
    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax
    push glyph 
    call show.x 
    jmp $          ; loop forever

    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature

         ,,,,


   * define a custom 8x16 pixel 'A' glyph in assembly language
   -----------------
   glyph db 00000000b
         db 00000000b
         db 01111111b
         db 01100011b
         db 01100011b
         db 01100011b
         db 01111111b
         db 01100011b
         db 01100011b
         db 01100011b
         db 01100011b
         db 01100011b
         db 01100011b
         db 00000000b
         db 00000000b
         db 00000000b
         ,,,,


  * How to redefine vga ega font map
  ---------

   INT 10H 1110H: Load and Activate User-Defined Font

   Compatibility: EGA VGA
   Expects:
       AX    1110H
       BH    height of each character (bytes per character definition)
       BL    font block to load (EGA: 0-3; VGA: 0-7)
       CX    number of characters to redefine
       DX    ASCII code of the first character defined at ES:BP
       ES:BP address of font-definition information
  ,,,,


  * make a series of chess glyphs to replace #$%^&* etc 
  ---------------
   [ORG 0]
   jmp 07C0h:start        ; start label in segment 07C0

  ; The chess glyphs to insert in bios font map at #

  pawn:
    db 0b00000000
    db 0b00000000
    db 0b00000000
    db 0b00000000
    db 0b00011000
    db 0b00111100
    db 0b00111100
    db 0b01111110
    db 0b00111100
    db 0b00111100
    db 0b01111110
    db 0b11111111
    db 0b11111111
    db 0b00000000
    db 0b00000000
    db 0b00000000

  knight:
    db 0b00000000
    db 0b00000000
    db 0b00000000
    db 0b01110000
    db 0b11111000
    db 0b11111100
    db 0b11011100
    db 0b11011110
    db 0b00011100
    db 0b00011100
    db 0b00011110
    db 0b00111111
    db 0b01111111
    db 0b00000000
    db 0b00000000
    db 0b00000000

  castle:
    db 0b00000000
    db 0b00000000
    db 0b11011011
    db 0b11111111
    db 0b11111111
    db 0b00111100
    db 0b00111100
    db 0b00111100
    db 0b00111100
    db 0b00111100
    db 0b01111110
    db 0b11111111
    db 0b11111111
    db 0b00000000
    db 0b00000000
    db 0b00000000

  ; (stack: char -- segment address, pointer -> bios glyph font address )
  chessglyph.doc:
    db 'inserts some custom chess glyphs into bios font map', 13, 10
    db 'starting somewhere at #'
    dw $-chessglyph.doc
  chessglyph:
    dw 0                 ; link to previous dict word or null
    db 9, 'chessglyph'   ; forth counted name
  chessglyph.x:
    ; for int 10h redefine font
    ; AX    1110H
    ; BH    height of each character (bytes per character definition)
    ; CX    number of characters to redefine
    ; DX    ASCII code of the first character defined at ES:BP
    ; ES:BP address of font-definition information

    mov bl, 0           ; font block ??
    mov bh, 16          ; height of font (bytes per character)
    mov cx, 3           ; number of chars to redefine
    mov dx, '#'         ; redefine glyphs starting at # 
    mov bp, pawn        ; es:bp points to start font data to use 
    mov ax, 0x1110      ; redefine bios font
    int 10h
    ret

  start:
    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax

    mov ah, 0
    mov al, 0    ; video mode 40x25 text, big text
    int 10h      ; set video mode big text 
    call chessglyph.x
    mov ah, 0x0e
    mov cx, 3
    mov al, '@'
  .again:
    mov al, '#'-1    ; print modified #$%^& etc
    add al, cl 
    int 10h
    mov al, ' '
    int 10h
    loop .again

    jmp $          ; loop forever
    times 510-($-$$) db 0   
    dw 0xAA55              
  ,,,
 

  * set a modified bios glyph for an asci character
  ---------------
   [ORG 0]
   jmp 07C0h:start        ; start label in segment 07C0

  ; The custom glyph to insert in bios font map
  castle 
    db 0b00000000
    db 0b00000000
    db 0b11011011
    db 0b11111111
    db 0b11111111
    db 0b00111100
    db 0b00111100
    db 0b00111100
    db 0b00111100
    db 0b00111100
    db 0b01111110
    db 0b11111111
    db 0b11111111
    db 0b00000000
    db 0b00000000
    db 0b00000000

  ; (stack: char -- segment address, pointer -> bios glyph font address )
  biosglyph.doc:
    db 'sets letter a to a custom glyph', 13, 10
    db 'eg: biosglyph'
    db 'stack: char-to-mod, ptr->8x16data '
    dw $-biosglyph.doc
  biosglyph:
    dw 0                ; link to previous dict word or null
    db 9, 'biosglyph'   ; forth counted name
  biosglyph.x:
    ; for int 10h redefine font
    ; AX    1110H
    ; BH    height of each character (bytes per character definition)
    ; CX    number of characters to redefine
    ; DX    ASCII code of the first character defined at ES:BP
    ; ES:BP address of font-definition information

    mov bl, 0           ; font block ??
    mov bh, 16          ; height of font (bytes per character)
    mov cx, 1           ; only redefine one char
    mov dx, '#'         ; redefine only 'a' glyph
    mov bp, castle      ; es:bp points to font data 
    mov ax, 0x1110      ; redefine bios font
    int 10h
    ret

  start:
    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax

    mov ah, 0
    mov al, 0    ; video mode 40x25 text, big text
    int 10h      ; set video mode big text 
    call biosglyph.x
    mov ah, 0x0e
    mov cx, 5
  .again:
    mov al, '#'
    int 10h
    mov al, ' '
    int 10h
    loop .again

    jmp $          ; loop forever
    times 510-($-$$) db 0   
    dw 0xAA55              
  ,,,
 
PIXELS AND DRAWING

BITMAPS ....

  * render a bitmap in standard text mode 
  -------------
   [ORG 0]
   jmp 07C0h:start        ; start label in segment 07C0

    block equ 0xFE    ; ascii code for small block
    bigblock equ 219  ; ascii code for big block
    alpha equ 224   ; Greek letter alpha 
    beta  equ 225   ; Greek letter beta
    gamma equ 226   ; Greek letter gamma
    char db '*'
    ;char db [gamma]

  ship:
    db 8, 8           ; width x height bytes
    db 0b00100100     ; glyph data
    db 0b01111110
    db 0b11100111
    db 0b00111100
    db 0b01100110
    db 0b01110010
    db 0b00000010
    db 0b00001110

  ; (stack: segment address, glyph address - )
  bitmap.glyphpointer dw 0
  bitmap.render db ' ', block
  bitmap.doc:
    db 'Displays a bitmap in text/graphics mode at a given point '
    db 'on the screen.'
    db 'The 1st byte of the bitmap=width, 2nd=height, then data'
    db ' [stack: pointer to bitmap, x, y :tos -- ] '
    dw $-bitmap.doc
  bitmap:
    dw 0           ; link to previous dict word or null
    db 6, 'bitmap' ; forth counted name
  bitmap.x:
    mov ax, 0xB800
    mov es, ax     ; es -> start of video memory (for stosw and video print)
    mov di, 20     ; start printing address

    pop dx         ; balance return ip
    pop bx         ; x coordinate
    pop ax         ; y coordinate
    pop si         ; pointer to 1st byte of glyph
    push dx        ; restore fn pointer
    mov dx, 160    ; 
    mul dx         ; ax * 160 to get y position (80 cols, 2 bytes/char)
    shl bx, 1      ; bx := bx * 2
    add ax, bx     ; but! need to bx * 2 (since 2 bytes/ char)
    mov di, ax     ; screen printing position 

    xor bx, bx     ; bx := 0 bh := 0 so no background colours
    xor ax, ax     ; ax := 0
    xor cx, cx
    mov [bitmap.glyphpointer], si       ; save si in a pointer var 
    mov cl, [si+1]   ; height bytes in the glyph (2nd byte of glyph) 
    cld              ; make lodsb go forwards ("clear direction flag")
    add si, 2        ; skip width+height bytes
  .nextrow:
    lodsb            ; get next byte (glyph row) to al
    mov dl, al       ; save glyph row to dl
    push cx          ; save current row number
    push di          ; save screen print position
    xor cx, cx
    mov bx, [bitmap.glyphpointer]
    mov cl, byte [bx]   ; width bytes in glyph (1st byte)
  .nextbit:
    rol dl, 1        ; print mirror image with ror bl, 1 
    mov bl, dl
    and bl, 0b00000001  ; check if bit is set
    mov al, [bitmap.render+bx]  ; get foreground or background character

    ;mov ah, '/'       ; white on green colour
    mov ah, 0b00001110 ; yellow on black
    stosw             ; print to screen es:di++:=ax, al==char, ah==colour
    loop .nextbit     ; next pixel on same row 
    pop di            ; restore screen print pos
    add di, 160       ; down 1 row (8 cols * 2 bytes/char)
    pop cx            ; restore current row number in glyph
    loop .nextrow     ; render the next row of bits in the glyph or sprite
  .exit:
    ret

  start:
    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax
    push ship      ; pointer to glyph
    push 10        ; y pos
    push 10        ; x pos
    call bitmap.x
    jmp $          ; loop forever
    times 510-($-$$) db 0   
    dw 0xAA55              
  ,,,
 

  * display a bit map in video mode 13H 
  -------------
   [ORG 0]
   jmp 07C0h:start        ; start label in segment 07C0

    alpha equ 224   ; Greek letter alpha 
    beta  equ 225   ; Greek letter beta
    gamma equ 226   ; Greek letter gamma

  ship:
    db 8, 8           ; width x height bytes
    db 0b00100100     ; glyph data
    db 0b10000000
    db 0b10100111
    db 0b11001100
    db 0b01000010
    db 0b01000010
    db 0b00000000
    db 0b10101010

  ; (stack: segment address, glyph address - )
  pixmap.glyphpointer dw 0
  pixmap.render db 0, 0b00001111  ; white on black, overwritten by param
  pixmap.doc:
    db 'Displays a bitmap in graphics mode 13h at a given point '
    db 'on the screen.'
    db 'The 1st byte of the bitmap=width, 2nd=height, then data'
    db ' [stack: pointer to bitmap, x, y :tos -- ] '
    dw $-pixmap.doc
  pixmap:
    dw 0           ; link to previous dict word or null
    db 6, 'bitmap' ; forth counted name
  pixmap.x:
    mov ax, 0xA000 ; address of 1st pixel in display memory (mode 13H)
    mov es, ax     ; es -> start of video memory (for stosw and video print)

    pop dx         ; balance return ip
    pop bx         ; x coordinate
    pop ax         ; y coordinate
    pop cx         ; colour of sprite foreground (in low byte)
    pop si         ; pointer to 1st byte of glyph
    push dx        ; restore fn pointer
    mov [pixmap.render+1], cl  ; save colour  
    mov dx, 320    ; 
    mul dx         ; ax * 320 to get y position (320 cols * 1 byte)
    add ax, bx     ; add x offset to y offset 
    mov di, ax     ; screen printing position 

    xor bx, bx     ; bx := 0 bh := 0 
    xor ax, ax     ; ax := 0
    xor cx, cx
    mov [pixmap.glyphpointer], si       ; save si in a pointer var 
    mov cl, [si+1]   ; height bytes in the glyph (2nd byte of glyph) 
    cld              ; make lodsb go forwards ("clear direction flag")
    add si, 2        ; skip width+height bytes
  .nextrow:
    lodsb            ; get next byte (glyph row) to al
    mov dl, al       ; save glyph row to dl
    push cx          ; save current row number
    push di          ; save screen print position
    xor cx, cx
    mov bx, [pixmap.glyphpointer]
    mov cl, byte [bx]   ; width bytes in glyph (1st byte)
  .nextbit:
    rol dl, 1        ; print mirror image with ror bl, 1 
    mov bl, dl
    and bl, 0b00000001  ; check if bit is set
    mov al, [pixmap.render+bx]  ; get foreground or background character

    stosb             ; pixel to screen es:di++:=al, al==colour
    loop .nextbit     ; next pixel on same row 
    pop di            ; restore screen print pos
    add di, 320       ; down 1 row (8 cols * 2 bytes/char)
    pop cx            ; restore current row number in glyph
    loop .nextrow     ; render the next row of bits in the glyph or sprite
  .exit:
    ret

  start:
    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax

    mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
    int 0x10           ; bios int 10h, ah=0, al=video mode
    push ship      ; pointer to glyph
    push 14        ; colour of sprite (yellow)
    push 60        ; y pos
    push 80        ; x pos
    call pixmap.x
    
    jmp $          ; loop forever
    times 510-($-$$) db 0   
    dw 0xAA55              
  ,,,
 
WITH INTERRUPTS ....

   video mode 13h has the highest number of colours (for vga)
   int 10h functions for drawing pixels are considered slow.
   Pixels can only be read and written in graphics modes

   * draw 1 white pixel at (10,10) 
   --------------------
   start:
     mov ax, 07C0h
     mov ds, ax
   .setmode:
     mov     ah, 0       ; set graphics display mode function.
     mov     al, 13h     ; mode 13h = 320x200 at 8 bits/pixel.
     int     10h         ; set it!
   .draw:
     mov     cx, 10      ; x-coordinate 
     mov     dx, 10      ; y-coordinate 
     mov     al, 15      ; white
     mov     ah, 0ch     ; put pixel
     int     10h         ; draw pixel
     jmp $
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

   * draw a diagonal line with x86 bios interrupts
   --------------------
   [org 0]
   jmp 07C0h:start

   hextable db "0123456789ABCDEF"    ; digit translation table
   glyph equ 'a' 
   line:
      dw 0
      db 4, 'line'
   line.x:
     mov cx, 0x0040    ; draw diagonal line 64 pixels long

   .draw:
     mov     dx, cx      ; y-coordinate in dx
     mov     al, 15      ; white
     mov     ah, 0ch     ; put pixel
     int     10h         ; draw pixel
     loop .draw
     ret

   ; sets the video mode from stack
   ; 
   vmode:
     dw line
     db 5, 'vmode'
   vmode.x:
     pop dx              ; juggle return pointer
     pop ax              ; video mode into al
     xor ah, ah          ; set ah := 0 (int 10h set display mode)
     push dx
     int     10h         ; set it!
     ret

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     push 0x0013        ; mode 13h = 320x200 at 8 bits/pixel.
     call vmode.x
     call line.x
     ; push 0x0003        ; default text mode 

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

   * attempt to write text in a graphics mode, doesnt work!!
   --------------------
   start:
     mov ax, 07C0h
     mov ds, ax
   .setmode:
     mov     ah, 0       ; set graphics display mode function.
     mov     al, 13h     ; mode 13h = 320x200 at 8 bits/pixel.
     int     10h         ; set it!
   .text:
     mov ah, 0eh
     mov al, 'Q'
     int 10h
   .draw:
     mov     cx, 10      ; column
     mov     dx, 10      ; row
     mov     al, 15      ; white
     mov     ah, 0ch     ; put pixel
     int     10h         ; draw pixel
     jmp $
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,


SPRITES ....

  A sprite is just a visual glyph that may move by some animation
  or not. Here we will start with simple monochrome bitmaps such
  as in 'space invaders'

  This "pix" word should take the pixel position from the stack
  Also a colour would be nice. Draw a circle etc.


  * write a pixel to video memory in mode 13h
  ---------------
   [org 0]
   jmp 07C0h:start

   pix:
      dw 0
      db 3, 'pix'
   pix.x:
     ; DisplayMode 13h
     ; screen size.x = 0x0140, (320x200 pixels)
     ; screen size.y = 0x00C8  (200 pixels high)
     ; number of colors = 0x0100
     ; address of pixel 0 = A000

      mov ax, 0xA000  ; address of 1st pixel in display memory
      mov es, ax
      mov cx, 0x0140  ; screenSize.x = 320 pixels
      mov ax, 30       ; posY
      mul cx           ; ax *= cx
      add ax, 10       ; ax += posX 
      mov di, ax       ; di = offset of pixel  
      mov dx, 0x000A   ; dl = color of pixel, 256 colours
      mov [es:di], dl  ; write pixel to memory
      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
     int 0x10           ; bios int 10h, ah=0, al=video mode
     call pix.x

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,


  Perhaps this code should check for the boundaries of 
  video memory, otherwise it is going to overwrite stuff 
  that it shouldnt.

  * display a diagonal line at a point given on stack (x y length)
  ---------------
   [org 0]
   jmp 07C0h:start

   diag.doc db 'Display a 45 angle line, 1 pixel wide in video mode 13h'
            dw $-diag.doc
   diag:
      dw 0
      db 4, 'diag'
   diag.x:
     ; DisplayMode 13h (320w x 200h pixels)
     ; number of colors = 0x0100
     ; address of pixel 0 = A000

      mov ax, 0xA000  ; address of 1st pixel in display memory
      mov es, ax      ; extended segment set to video memory
      pop dx      ; juggle return address
      pop cx      ; line length
      pop ax      ; y coordinate
      pop bx      ; x coordinate
      push dx     ; restore return pointer
      mov dx, 0x0140 ; screenSize.x = 320 pixels
      mul dx         ; ax *= dx, y offset of pixel
      add ax, bx     ; add x offset of pixel
      mov dl, 0x0A   ; dl = color of pixel, 256 colours
      mov di, ax       ; di = offset of pixel  
    .line:
      mov [es:di], dl  ; write pixel to memory
      add di, 321      ; 1 row (320) + 1 horizontal (x) pixel
      ; mov [es:di], cl  ; creates a multicoloured line 
      ; add ax, cx       ; this actually creates a curve effect
      loop .line
      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
     int 0x10           ; bios int 10h, ah=0, al=video mode
     push 0            ; x coordinate
     push 40            ; y coordinate
     push 150            ; line length
     call diag.x

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  * a demonstration of a few line drawing techniques 
  ---------------
   [org 0]
   jmp 07C0h:start

   demo.doc 
     db 'display different line types in video mode 13hex'
     dw $-demo.doc
   demo:
      dw 0
      db 4, 'demo'
   demo.x:
     ; DisplayMode 13h (320w x 200h pixels) number of colors:256 

      mov ax, 0xA000  ; address of 1st pixel in display memory
      mov es, ax      ; extended segment set to video memory
      pop dx      ; juggle return address
      pop ax      ; y coordinate
      pop bx      ; x coordinate
      push dx     ; restore return pointer
      mov dx, 0x0140 ; screenSize.x = 320 pixels
      mul dx         ; ax *= dx, y offset of pixel
      add ax, bx     ; add x offset of pixel
      mov di, ax     ; di = offset of pixel  
      mov dl, 0x0A   ; al = color of pixel, 256 colours
      mov cx, 30     ; line length
      push di        ; save start pixel co-ordinates

    .dotted:
      mov [es:di], dl   ; write colour pixel to memory
      add di, 4        ; creates a dotted line
      loop .dotted

      pop di          ; go back to start position
      add di, 960     ; 2 rows down
      push di         ; save start again
      mov cx, 100 

    .rainbow:
      mov [es:di], cl ; multi colour pixel to memory
      add di, 1       
      loop .rainbow

      pop di          ; go back to start position
      add di, 960     ; 3 rows down
      push di         ; save start again
      mov cx, 20 

    .curve:
      mov [es:di], dl ; 
      mov ax, di
      add ax, cx       
      mov di, ax
      loop .curve

      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
     int 0x10           ; bios int 10h, ah=0, al=video mode
     push 10            ; x coordinate
     push 10            ; y coordinate
     call demo.x

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  * display a vertical line at a point given on stack (x y length)
  ---------------
   [org 0]
   jmp 07C0h:start

   vline.doc 
     db 'Display a vertical line, 1 pixel wide in video mode 13h'
     dw $-vline.doc
   vline:
      dw 0
      db 5, 'vline'
   vline.x:
     ; DisplayMode 13h (320w x 200h pixels) number of colors:256 

      mov ax, 0xA000  ; address of 1st pixel in display memory
      mov es, ax      ; extended segment set to video memory
      pop dx      ; juggle return address
      pop cx      ; line length
      pop ax      ; y coordinate
      pop bx      ; x coordinate
      push dx     ; restore return pointer
      mov dx, 0x0140 ; screenSize.x = 320 pixels
      mul dx         ; ax *= dx, y offset of pixel
      add ax, bx     ; add x offset of pixel
      mov di, ax     ; di = offset of pixel  
      mov dl, 0x0A   ; al = color of pixel, 256 colours
    .line:
      mov [es:di], dl   ; write colour pixel to memory
      add di, 320       ; down one row 
      loop .line

      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
     int 0x10           ; bios int 10h, ah=0, al=video mode
     push 10            ; x coordinate
     push 10            ; y coordinate
     push 50            ; line length
     call vline.x

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  * display a horizontal line at a point given on stack (x y length)
  ---------------
   [org 0]
   jmp 07C0h:start

   hline.doc:
     db 'Display a horizontal line, 1 pixel wide in video mode 13h'
     dw $-hline.doc
   hline:
      dw 0
      db 5, 'hline'
   hline.x:
     ; DisplayMode 13h (320w x 200h pixels) number of colors:256 

      mov ax, 0xA000  ; address of 1st pixel in display memory
      mov es, ax      ; extended segment set to video memory
      pop dx      ; juggle return address
      pop cx      ; line length
      pop ax      ; y coordinate
      pop bx      ; x coordinate
      push dx     ; restore return pointer
      mov dx, 0x0140 ; screenSize.x = 320 pixels
      mul dx         ; ax *= dx, y offset of pixel
      add ax, bx     ; add x offset of pixel
      mov di, ax     ; di = offset of pixel  
      mov al, 0x0A   ; al = color of pixel, 256 colours
      rep stosb       ; store colour in video memory, loop while cx > 0

      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
     int 0x10           ; bios int 10h, ah=0, al=video mode
     push 10            ; x coordinate
     push 70            ; y coordinate
     push 150            ; line length
     call hline.x

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,


  The code below needs to be refined. When the end of the whirl
  vector is reached (4) then it should be reset to 0 so that we 
  can draw spirals of any length.

  * display a boxy spiral 
  ---------------
   [org 0]
   jmp 07C0h:start

   ; the offset vectors for drawing the sides of the box
   whirl.vector dw 1, 320, -1, -320, 1, 320, -1, -320
   whirl.doc:
     db 'Display a spiral'
     dw $-whirl.doc
   whirl:
      dw 0
      db 5, 'whirl'
   whirl.x:
     ; DisplayMode 13h (320w x 200h pixels) number of colors:256 

      mov ax, 0xA000  ; address of 1st pixel in display memory
      mov es, ax      ; extended segment set to video memory
      pop dx      ; juggle return address
      pop cx      ; spiral start width 
      pop ax      ; y coordinate (top left corner)
      pop bx      ; x coordinate (top left corner)
      push dx     ; restore return pointer
      mov dx, 0x0140 ; screenSize.x = 320 pixels
      mul dx         ; ax *= dx, y offset of pixel
      add ax, bx     ; add x offset of pixel
      mov di, ax     ; di = offset of pixel  
      mov al, 0x0A   ; al = color of pixel, 256 colours

      mov dx, cx      ; save box width/height
      xor bx, bx      ; bx := 0, bx is a loop counter and table offset

    .nextside:
      ;cmp word [whirl.vector+bx], -1
      ;jne .same
      sub dx, 2
    .same:
      mov cx, dx      ; restore side length counter
    .nextpixel:
      mov [es:di], al        ; write colour pixel to memory
      add di, [whirl.vector+bx] 
      ;add di, ax 
      loop .nextpixel
      add bx, 2                 ; point to next pixel vector
      cmp bx, 16 
      jne .nextside

    .exit:
      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
     int 0x10           ; bios int 10h, ah=0, al=video mode
     push 40            ; x coordinate of top left corner
     push 10            ; y coordinate of top left corner
     push 30            ; width and height of box 
     call whirl.x

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  In qemu the box created below is not actually square funnily
  enough


  The code below uses a loop and a pixel offset vector to 
  simplify (or make more concise) the code for drawing the box.
  The good thing about this is that it is applicable for drawing
  any series of lines.

  * a more concise square box at a point given on stack (x y width)
  ---------------
   [org 0]
   jmp 07C0h:start

   ; the offset vectors for drawing the sides of the box
   box.vector dw 1, 320, -1, -320
   box.doc:
     db 'Display a square box in video mode 13'
     dw $-box.doc
   box:
      dw 0
      db 3, 'box'
   box.x:
     ; DisplayMode 13h (320w x 200h pixels) number of colors:256 

      mov ax, 0xA000  ; address of 1st pixel in display memory
      mov es, ax      ; extended segment set to video memory
      pop dx      ; juggle return address
      pop cx      ; box width and height 
      pop ax      ; y coordinate (top left corner)
      pop bx      ; x coordinate (top left corner)
      push dx     ; restore return pointer
      mov dx, 0x0140 ; screenSize.x = 320 pixels
      mul dx         ; ax *= dx, y offset of pixel
      add ax, bx     ; add x offset of pixel
      mov di, ax     ; di = offset of pixel  
      mov al, 0x0A   ; al = color of pixel, 256 colours

      mov dx, cx      ; save box width/height
      xor bx, bx      ; bx := 0, bx is a loop counter and table offset

    .nextside:
      mov cx, dx      ; restore side length counter
    .nextpixel:
      mov [es:di], al        ; write colour pixel to memory
      add di, [box.vector+bx]   ; down one row 
      loop .nextpixel
      add bx, 2                 ; point to next pixel vector
      cmp bx, 8 
      jne .nextside

    .exit:
      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
     int 0x10           ; bios int 10h, ah=0, al=video mode
     push 40            ; x coordinate of top left corner
     push 10            ; y coordinate of top left corner
     push 50            ; width and height of box 
     call box.x

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  The code below is very pedestrian. see code above for a 
  better and more concise way to do this.

  * display a square box at a point given on stack (x y width)
  ---------------
   [org 0]
   jmp 07C0h:start

   box.doc:
     db 'Display a square box in video mode 13'
     dw $-box.doc
   box:
      dw 0
      db 3, 'box'
   box.x:
     ; DisplayMode 13h (320w x 200h pixels) number of colors:256 

      mov ax, 0xA000  ; address of 1st pixel in display memory
      mov es, ax      ; extended segment set to video memory
      pop dx      ; juggle return address
      pop cx      ; box width and height 
      pop ax      ; y coordinate (top left corner)
      pop bx      ; x coordinate (top left corner)
      push dx     ; restore return pointer
      mov dx, 0x0140 ; screenSize.x = 320 pixels
      mul dx         ; ax *= dx, y offset of pixel
      add ax, bx     ; add x offset of pixel
      mov di, ax     ; di = offset of pixel  
      mov al, 0x0A   ; al = color of pixel, 256 colours

      mov dx, cx      ; save box width/height

    .top:
      cld           ; go forwards (from left to right on screen)
      rep stosb     ; write the colour pixel in al to video
      mov cx, dx    ; restore box dimension

    .right:
      mov [es:di], al   ; write colour pixel to memory
      add di, 320       ; down one row 
      loop .right
      mov cx, dx        ; restore box dimension

    .bottom:
      std
      rep stosb      ; [es:di]++ <- AL register (al is the colour)
      mov cx, dx      ; restore box dimension

    .left:
      mov [es:di], al   ; write colour pixel to memory
      sub di, 320       ; down one row 
      loop .left

      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
     int 0x10           ; bios int 10h, ah=0, al=video mode
     push 40            ; x coordinate of top left corner
     push 10            ; y coordinate of top left corner
     push 50            ; width and height of box 
     call box.x

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  * display a hexagon starting at point (x y) with line length l
  ---------------
   [org 0]
   jmp 07C0h:start

   ; the offset vectors for drawing the sides of the hexagon
   hexagon.vector dw 1, 321, 319, -1, -321, -319
   hexagon.doc:
     db 'Display a hexagon in video mode 13'
     dw $-hexagon.doc
   hexagon:
      dw 0
      db 7, 'hexagon'
   hexagon.x:
      mov ax, 0xA000  ; address of 1st pixel in display memory (mode 13H)
      mov es, ax      ; extended segment set to video memory
      pop dx      ; juggle return address
      pop cx      ; hexagon side length 
      pop ax      ; y coordinate of starting point 
      pop bx      ; y coordinate 
      push dx     ; restore function return pointer
      mov dx, 0x0140 ; screenSize.x = 320 pixels
      mul dx         ; ax *= dx, y offset of pixel
      add ax, bx     ; add x offset of pixel
      mov di, ax     ; di := offset of pixel in video memory
      mov al, 0x0A   ; al = color of pixel, 256 colours

      mov dx, cx      ; save hexagon side length
      xor bx, bx      ; bx := 0, bx is a loop counter and table offset

    .nextside:
      mov cx, dx      ; restore side length counter
    .nextpixel:
      mov [es:di], al        ; write colour pixel to memory
      add di, [hexagon.vector+bx]   ; down one row 
      loop .nextpixel
      add bx, 2                 ; point to next pixel vector
      cmp bx, 12 
      jne .nextside

    .exit:
      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
     int 0x10           ; bios int 10h, ah=0, al=video mode
     push 40            ; x coordinate of top left corner
     push 60            ; y coordinate of top left corner
     push 5            ; width and height of box 
     call hexagon.x

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  * display a octagon starting at point (x y) with line length l
  ---------------
   [org 0]
   jmp 07C0h:start

   ; the offset vectors for drawing the sides of the hexagon
   octagon.vector dw 1, 321, 320, 319, -1, -321, -320, -319
   octagon.doc:
     db 'Display an octagon in video mode 13'
     dw $-octagon.doc
   octagon:
      dw 0            ; link to previous fn 
      db 7, 'octagon'
   octagon.x:
      mov ax, 0xA000  ; address of 1st pixel in display memory (mode 13H)
      mov es, ax      ; extended segment set to video memory
      pop dx      ; juggle return address
      pop cx      ; octagon side length 
      pop ax      ; y coordinate of starting point 
      pop bx      ; y coordinate 
      push dx     ; restore function return pointer
      mov dx, 0x0140 ; screenSize.x = 320 pixels
      mul dx         ; ax *= dx, y offset of pixel
      add ax, bx     ; add x offset of pixel
      mov di, ax     ; di := offset of pixel in video memory
      mov al, 0x0A   ; al = color of pixel, 256 colours

      mov dx, cx      ; save octagon side length
      xor bx, bx      ; bx := 0, bx is a loop counter and table offset

    .nextside:
      mov cx, dx      ; restore side length counter
    .nextpixel:
      mov [es:di], al        ; write colour pixel to memory
      add di, [octagon.vector+bx]   ; down one row 
      loop .nextpixel
      add bx, 2                 ; point to next pixel vector
      cmp bx, 16
      jne .nextside

    .exit:
      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
     int 0x10           ; bios int 10h, ah=0, al=video mode
     push 40            ; x coordinate of top left corner
     push 60            ; y coordinate of top left corner
     push 3            ; width and height of box 
     call octagon.x

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  * display a line to any of the compass points starting at (x y) 
  ---------------
   [org 0]
   jmp 07C0h:start

   ; offsets by dir: no, ne, etc
   compass.vector dw -320, -319, 1, 321, 320, 319, -1, -321
   compass.doc:
     db 'Displays a line from (x y) length |l| to north, north east, east'
     db '  south-east, west, etc. 1=north, 2=north east, 3=north west ...' 
     db '  stack parameters (TOS: dir, length, y, x)'
     dw $-compass.doc
   compass:
      dw 0
      db 7, 'compass'
   compass.x:
      mov ax, 0xA000  ; address of 1st pixel in display memory (mode 13H)
      mov es, ax      ; extended segment set to video memory
      pop dx      ; juggle return address
      pop si      ; compass direction (1-8) 1=n, 2=ne, 3=e, 4=se etc
      pop cx      ; line length 
      pop ax      ; y coordinate of starting point 
      pop bx      ; y coordinate 
      push dx     ; restore function return pointer
      mov dx, 320    ; video screensize(x) = 320 pixels
      mul dx         ; ax *= dx, y offset of pixel
      add ax, bx     ; add x offset of pixel
      mov di, ax     ; di := offset of pixel in video memory
      mov al, 0x0A   ; al = color of pixel, 256 colours
      mov bx, si     ; get compass direction into dx

      dec bx
      cmp bx, 8      ; only 8 directions (1-8)
      jg .exit       ; not quite working
      add bx, bx     ; bx := bx*2, since offset is word pointer

    .nextpixel:
      mov [es:di], al   ; write colour pixel to video memory
      add di, [compass.vector + bx]
      loop .nextpixel

    .exit:
      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
     int 0x10           ; bios int 10h, ah=0, al=video mode
     mov cx, 8          ; try all eight directions

   .nextdir:

     push cx
     push 100            ; x coordinate of top left corner
     push 100            ; y coordinate of top left corner
     push 10             ; line length 
     push cx             ; next direction
     call compass.x
     pop cx

     loop .nextdir

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  
  * display an arrow to any of the compass points starting at (x y) 
  ---------------
   [org 0]
   jmp 07C0h:start

   ; offsets by dir: no, ne, etc
   arrow.offset dw -320, -319, 1, 321, 320, 319, -1, -321
   arrow.doc:
     db 'Displays an arrow from (x y) length |l| to north, north east, east'
     db '  south-east, west, etc. 1=north, 2=north east, 3=north west ...' 
     db '  stack parameters (TOS: dir, length, y, x)'
     dw $-arrow.doc
   arrow:
      dw 0            ; link to previous
      db 5, 'arrow'
   arrow.x:
      mov ax, 0xA000  ; address of 1st pixel in display memory (mode 13H)
      mov es, ax      ; extended segment set to video memory
      pop dx      ; juggle return address
      pop si      ; compass direction (1-8) 1=n, 2=ne, 3=e, 4=se etc
      pop cx      ; line length 
      pop ax      ; y coordinate of starting point 
      pop bx      ; y coordinate 
      push dx     ; restore function return pointer
      mov dx, 320    ; video screensize(x) = 320 pixels
      mul dx         ; ax *= dx, y offset of pixel
      add ax, bx     ; add x offset of pixel
      mov di, ax     ; di := offset of pixel in video memory
      mov al, 0x0A   ; al = color of pixel, 256 colours
      mov bx, si     ; get compass direction into dx
        
      dec bx
      cmp bx, 7      ; only 8 directions (1-8)
      jg .exit
      add bx, bx     ; bx := bx*2, since offset is word pointer
      mov bx, [arrow.offset + bx]

    .nextpixel:
      mov [es:di], al   ; write colour pixel to memory
      add di, bx
      loop .nextpixel

      ; try to add the arrow head
      sub di, 322 
      mov [es:di], al   ; write colour pixel to memory
      sub di, 321 
      mov [es:di], al   ; write colour pixel to memory

    .exit:
      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
     int 0x10           ; bios int 10h, ah=0, al=video mode
     mov cx, 8          ; try all eight directions

   .nextdir:

     push cx
     push 100            ; x coordinate of top left corner
     push 100            ; y coordinate of top left corner
     push 10             ; line length 
     push cx             ; next direction
     call arrow.x
     pop cx

     loop .nextdir

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  * make lines of colour in video mode 13h
  ---------------
   [org 0]
   jmp 07C0h:start

   ray:
      dw 0
      db 3, 'ray'
   ray.x:
     ; DisplayMode 13h
     ; screen size.x = 0x0140, (320x200 pixels)
     ; screen size.y = 0x00C8  (200 pixels high)
     ; number of colors = 0x0100
     ; address of pixel 0 = A000

      mov cx, 0xFFFF  ; draw lots of pixels
      mov ax, 0xA000  ; address of 1st pixel in display memory
      mov es, ax
   .pixel:
      mov ax, cx       ; next pixel 
      mov di, ax       ; di = offset of pixel  
      mov dl, cl       ; dl = color of pixel, 256 colours
      mov [es:di], dl  ; write pixel to memory
      loop .pixel
      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
     int 0x10           ; bios int 10h, ah=0, al=video mode
     call ray.x

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,


FILLED SHAPES ....

  We can draw shapes started at a point and continueing along
  a line, but for filled shapes another technique is required.

ANIMATION

  The basic technique for animation is. Display something, wait,
  erase it, display something else, and so on

  cx=0 and dx=a2c2 gives about 24 frames per second. Good.

  * wait for about 1 second (cx:dx == 1000000 seconds)
  ---------
      mov cx, 0fh     ; cx:dx microseconds to wait
      mov dx, 4240h
      mov ah, 86h
      int 15h
  ,,,

  cx=0x000F approx 1 second
  cx=0x0008 approx 1/2 second
  cx=0x0004 approx 1/4 seconds
  cx=0x0002 approx 1/8 seconds
  cx=0x0001 approx 1/16 seconds

  * a delay for about 24 frames per second
  ---------
      xor cx, cx 
      mov dx, 0xa2c2 
      mov ah, 86h
      int 15h
  ,,,

  * display an animated octagon 
  ---------------
   [org 0]
   jmp 07C0h:start

   ; the offset vectors for drawing the sides of the hexagon
   octagon.vector dw 1, 321, 320, 319, -1, -321, -320, -319
   octagon.doc:
     db 'Display an octagon in video mode 13'
     dw $-octagon.doc
   octagon:
      dw 0            ; link to previous fn 
      db 7, 'octagon'
   octagon.x:
      mov ax, 0xA000  ; address of 1st pixel in display memory (mode 13H)
      mov es, ax      ; extended segment set to video memory
      pop dx      ; juggle return address
      pop si      ; colour
      pop cx      ; octagon side length 
      pop ax      ; y coordinate of starting point 
      pop bx      ; y coordinate 
      push dx     ; restore function return pointer
      mov dx, 0x0140 ; screenSize.x = 320 pixels
      mul dx         ; ax *= dx, y offset of pixel
      add ax, bx     ; add x offset of pixel
      mov di, ax     ; di := offset of pixel in video memory

      mov ax, si     ; colour in which to display 
      ;mov al, 0x0A   ; al = color of pixel, 256 colours

      mov dx, cx      ; save octagon side length
      xor bx, bx      ; bx := 0, bx is a loop counter and table offset

    .nextside:
      mov cx, dx      ; restore side length counter
    .nextpixel:
      mov [es:di], al        ; write colour pixel to memory
      add di, [octagon.vector+bx]   ; down one row 
      loop .nextpixel
      add bx, 2                 ; point to next pixel vector
      cmp bx, 16
      jne .nextside

    .exit:
      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
     int 0x10           ; bios int 10h, ah=0, al=video mode

     mov cx, 200
   .nextframe: 

     mov ax, 10
     add ax, cx
     mov bp, ax        ; save x pos in bp
     push cx

     push ax           ; x coordinate of top left corner
     push 60           ; y coordinate of top left corner
     push 10           ; width and height of box 
     push 0x000B       ; hexagon colour 
     ;push cx          ; changing colour hexagon (256 colours)
     
     call octagon.x
     xor cx, cx        ; cx := 0
     mov dx, 0xa2c2    ; which seems good for animation ie DX=0xA2C2
     mov ah, 86h       ; wait for timer function
     int 15h           ; bios interrupt 

     mov ax, bp
     push ax           ; x coordinate of top left corner
     push 60           ; y coordinate of top left corner
     push 10           ; width and height of box 
     push 0x00         ; hexagon colour (black to erase)
     
     call octagon.x

     pop cx
     loop .nextframe

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  
  We can use the colour black to erase the sprite in very
  simple animations.

  * display a static sprite
  ---------------
   [org 0]
   jmp 07C0h:start

   ; data in format: width, height, pixel colour data row by row 
   sprite.data db 3, 3, 10, 0, 10, 0, 10, 0, 11, 0, 11 
   sprite.doc:
     db 'Displays a sprite on the screen at given coordinates'
     db '  takes on stack: TOS: colour pointer-to-sprite y x'
     dw $-sprite.doc
   sprite:
      dw 0            ; link to previous fn 
      db 6, 'sprite'
   sprite.x:
      mov ax, 0xA000  ; address of 1st pixel in display memory (mode 13H)
      mov es, ax      ; extended segment set to video memory
      pop dx      ; juggle return address
      pop bp      ; colour
      pop si      ; pointer to sprite data 
      pop ax      ; y coordinate of starting point 
      pop bx      ; x coordinate 
      push dx     ; restore function return pointer
      mov dx, 0x0140 ; screenSize.x = 320 pixels
      mul dx         ; ax *= dx, y offset of pixel
      add ax, bx     ; add x offset of pixel
      mov di, ax     ; di := offset of pixel in video memory

      mov ax, bp     ; colour in which to display 
      ;mov al, 0x0A   ; al = color of pixel, 256 colours

      mov bp, si     ; save data pointer
      add si, 2      ; si now points to data
     
      ;mov cx, [bp+1]
      mov cx, 9
      
    .nextrow
      ;push cx
    .nextpixel:
      ;mov [es:di], byte [ds:si] ;  write colour pixel to memory
      movsb     ; write pixel at ds:si to es:di (video memory
      loop .nextpixel
      ;loop .nextrow

    .exit:
      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     mov ax,  0x0013    ; mode 13h = 320x200 at 8 bits/pixel.
     int 0x10           ; bios int 10h, ah=0, al=video mode

     push 60           ; x coord 
     push 10           ; y coord 
     push sprite.data  ; pointer to pixel data 
     push 0x000B       ; sprite colour, not used  
     
     call sprite.x

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

MEMORY MAPPED GRAPHICS ....
 
  much faster than the bios interrupt approach. Writes
  values directly to video memory, provided that these are 
  in standard places.

  https://thiscouldbebetter.wordpress.com/2011/03/17/vga-mode-13h-in-assembly-with-direct-memory-writes/
    A good complete example of writing graphics into memory

   The code below could be greatly simplified by removing the 
   generality of the code and making it specific to mode 13h. 
   Also the code demonstrates another way of writing a bootloader
   but doesnt seem to set up the stack properly.

   * display all colours in video mode 13 using memory accessed graphics
   -------------------------
  
 use16       ; 16-bit mode

org 0x7C00 ; address of the boot sector

  BootStageOne:
  ;
  mov ah,0x00 ; reset disk
  mov dl,0    ; drive number
  int 0x13    ; call BIOS interrupt routine
  ;
  ; load sectors from disk using BIOS interrupt 0x13
  mov ah,0x02 ; function number: read sectors into memory
  mov al,0x10 ; number of sectors to read (more than we need)
  mov dl,0    ; drive number
  mov ch,0    ; cylinder number
  mov dh,0    ; head number
  mov cl,2    ; starting sector number
  mov bx,Main ; memory location to load to 
  int 0x13    ; call BIOS interrupt routine
  ;
  jmp Main    ; now that it's been loaded
  ;

PadOutSectorOneWithZeroes:
  ; pad out all but the last two bytes of the sector with zeroes
  times ((0x200 - 2) - ($ - $$)) db 0x00

BootSectorSignature:
  dw 0xAA55 ; these must be the last two bytes in the boot sector

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

  Main:
  ; set mode to VGA 13h and draw the default palette
  ;
  push DisplayModeInstance13h
  call DisplayModeSet
              ;
       mov ax,0                ; pixel x
       mov bx,0                ; pixel y
  mov cx,[NumberOfColors]

DrawEveryColorInPalette:
  ;
  push ax                 ; pixel x
  push bx                 ; pixel y
  mov dx,[NumberOfColors]
  sub dx,cx
  push dx                 ; pixel color index
  call DisplayPixelDrawXY 
  ;
  inc ax
  cmp ax,[ColorsPerRow]
  jb NewRow
    mov ax,0
    inc bx
  NewRow:
  loop DrawEveryColorInPalette
  ;
  ret
  ;
  NumberOfColors: dw 0x0100
  ColorsPerRow: dw 0x0010

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

DisplayMode:
; +0 = number
; +2 = screen size in pixels
; +4 = number of colors
; +6 = address of pixel 0

DisplayModeCurrent:
    dw 0x0000

DisplayModeInstance13h:
dw 0x0013, DisplayModeInstance13hSize, 0x0100, 0xA000
DisplayModeInstance13hSize: dw 0x0140, 0x00C8 ; 320x200

DisplayModeSet:
  ; (displayModeToSet)
  ;
  push bp
  mov bp,sp
  push ax
  push si
  ;
  mov si,[bp+4] ; displayModeToSet
  ;
  mov ax,[si+0] ; displayModeToSet.number
  int 0x10
  ;
  mov [DisplayModeCurrent],si
   ;
  pop si
  pop ax
  pop bp
  ret 2

DisplayPixelDrawXY:
  ; (posX, posY, color)
  ;
  push bp
  mov bp,sp
  ;
  push ax
  push cx
  push dx
  push si
  push di
  push es
  ;
  mov si,[DisplayModeCurrent]
  ;
  mov es,[si+6]   ; address of display memory
  ;
  mov di,[si+2]   ; bx = displayModeCurrent.screenSize
  mov cx,[di+0]   ; cx = screenSize.x
  mov ax,[bp+6]   ; ax = posY
  mul cx          ; ax *= cx
  add ax,[bp+8]   ; ax += posX
  mov di,ax       ; di = offset of pixel  
  ;
  mov dx,[bp+4]   ; dl = color of pixel
  ;
  mov [es:di],dl  ; write pixel to memory
  ;
  pop es
  pop di
  pop si
  pop dx
  pop cx
  pop ax
  ;
  pop bp
  ret 6

  PadOutSectorsAllWithZeroes:
  times (0x2000 - ($ - $$)) db 0x00

  ,,,,

CURSOR SHAPE

  * set text-mode cursor shape.
  >> int 10h, ah=01h 

  input:
  CH = cursor start line (bits 0-4) and options (bits 5-7).
  CL = bottom cursor line (bits 0-4).

  when bit 5 of CH is set to 0, the cursor is visible. when bit 5 is 1, the
  cursor is not visible.

  * hide blinking text cursor: 
  ----------------------------
    mov ch, 32
    mov ah, 1
    int 10h
  ,,,

  * show standard blinking text cursor: -------------------------------------
    mov ch, 6
    mov cl, 7
    mov ah, 1
    int 10h
  ,,,

  * show box-shaped blinking text cursor: 
  ---------------------------------------
    mov ch, 0
    mov cl, 7
    mov ah, 1
    int 10h
    jmp $                   ; keep looping! 
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  * show a box cursor while reading keys 
  ---------------------------------------------------------
  start:
    mov ch, 0   ; set up the cursor
    mov cl, 7
    mov ah, 1
    int 10h     ; display the box cursor
  .repeat: 
    mov ah, 0
    int 16h     ; read a key
    mov ah, 0eH
    int 10H     ; display the last key pressed
    jmp .repeat  

    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,

   
  note: some bioses required CL to be >=7,
   otherwise wrong cursor shapes are displayed. 

COLOURS 

  16 colour mode uses a 4 bit encoding
  * * * *
  : I R G B
  where I=Intensity (eg dark or light green) and RGB means
  red, green, blue. The Intensity bit is also the 'blinking'
  bit in some modes (to make text blink)

  So we can change the intensity of a colour (from light to
  dark or vice-versa) by toggling the 1st bit of the nibble.

  This is not working when included in a bootloaded program,
  A very strange problem... when in a big program the colour print
  did not work. It seems to have to do with how far away code
  is from mov bx, colourcode.x, call bx  Once I repositioned 
  the word in the dictionary the colourprints started to work again.
  But normal ah=0x0E printing worked anyway. Very odd...
  ???...

  * print colour values in hex and binary 
  -------------
    [org 0]
    jmp 07C0h:start

    hextable db "0123456789ABCDEF"    ; digit translation table
    glyph equ 'a' 
    colourcode:
      dw 0
      db 10, 'colourcode'
    colourcode.x:
      mov cx, 0x000F     ; loop through all asci chars

    .nextchar:
      mov ah, 0x0E   ; x86 int 0x10 type char function
      mov al, cl     ; high nibble of ascii code to print 
      mov bx, hextable   ; pointer to digit translation table

      ; or just shr, since that fills with zeros???
      rol al, 4      ; print first digit
      and al, 0x0F   ; print high byte first
      xlatb          ; replace al with hex digit  al := [bx+al]
      int 10h        ; invoke bios 

      mov al, cl     ; lower nibble of ascii code 
      and al, 0x0F   ; print high byte first
      xlatb          ; replace al with hex digit  al := [bx+al]
      int 10h        ; invoke bios 

      mov ah, 0x0E     ; separate with a space 
      mov al, ' ' 
      int 10h

      ; white on background colour
      mov ah, 09h    ; x86 bios colour print function
      mov al, glyph  ; the char to print 
      mov bl, cl     ; 
      shl bl, 4      ; make it a back colour (high nibble) 
      or bl, 0b00001111 ; foreground white
      push cx
      mov cx, 24      ; print chars, cursor stays at beginning
      int 10h        ; 
      pop cx         ; restore counter

      mov ah, 09h    ; x86 bios colour print function
      mov al, glyph  ; spades char 
      mov bl, cl     ; 
      shl bl, 4      ; make it a back colour (high nibble) 
      or  bl, cl     ; foreground and back
      and bl, 0b01111111  ; make background dull 
      push cx
      mov cx, 20     ; print chars, cursor stays at beginning
      int 10h        ; 
      pop cx         ; restore counter

      mov ah, 09h    ; x86 bios colour print function
      mov al, glyph  ; spades char 
      mov bl, cl     ; 
      shl bl, 4      ; make it a back colour (high nibble) 
      or  bl, cl     ; foreground and back
      and bl, 0b11110111  ; make foreground dull 
      push cx
      mov cx, 16     ; print chars, cursor stays at beginning
      int 10h        ; 
      pop cx         ; restore counter

      mov ah, 09h    ; x86 bios colour print function
      mov al, glyph  ; spades char 
      mov bl, cl     ; 
      shl bl, 4      ; make it a back colour (high nibble) 
      or  bl, cl     ; foreground and back
      push cx
      mov cx, 12     ; print chars, cursor stays at beginning
      int 10h        ; 
      pop cx         ; restore counter

      ;  color  IRGBIRGB
      ;  bl color bits: intensity,red,green,blue,intensity,red,green,blue
      ;  16 foreground colours are printed first, and then 8 background
      ;  on top of them, covering the 1st 8, but not the second 8

      mov ah, 09h    ; colour print function
      mov al, glyph  ; the character to print
      mov bl, cl     ; cl is forground colour 0-15 
      push cx
      mov cx, 8      ; print chars, cursor stays at beginning
      int 10h        ; 
      pop cx         ; restore counter

      mov ah, 09h    ; x86 bios colour print function
      mov al, glyph  ; spades char 
      mov bl, cl     ; 
      shl bl, 4      ; make it a back colour (high nibble) 
      push cx
      mov cx, 4      ; print chars, cursor stays at beginning
      int 10h        ; 
      pop cx         ; restore counter

      mov ah, 0x0E      ; print a new line after each colour
      mov al, 13
      int 10h
      mov al, 10
      int 10h

   .end:       
      cmp cx, 0
      je .exit
      dec cx
      jmp .nextchar
      ; loop .nextchar ; loop is out of range, short jump
    .exit:
      ret

    start:
      mov ax, cs
      mov ds, ax
      mov es, ax      ; stosw uses es segment reg
      call colourcode.x
      jmp $

    times 510-($-$$) db 0   ; Pad boot sector with 0s
    dw 0xAA55               ; MBR boot signature

  ,,,,

  * turn on the blink/intensity bit for a coloured character
  ------------
    mov ah, 9
    mov al, 'a'
    mov bh, 0
    mov bl, colour
    or bl, 10000000b
    mov cx, 1
    int 10h
  ,,,

COLOUR AND TEXT ....

 If we are in a graphics mode eg 13h then we can use bl to determine
 the colour of the text and we can write text to the screen!!!
 For some reason I assumed that it was not possible to write text 
 to the screen in graphic video modes!!!!

 The bios in text mode is able to display text in 16 different
 colours.

 Write character and attribute at cursor position
 int 10h, ah=09h, 
   al=character, bh=page number, bl=color, 
   cx=number of times to print character   

 The character colour attribute is 8 bit value in the BL register
 the low 4 bits set forground color,
 the high 4 bits set background color.  

  * print intense red (fg) on intense blue (background)
  --------------
              IRGBIRGB 
    mov BL, 0b10101100
  ,,,

 The cursor position is not changed after writing the characters

 * print "====" in green at the current cursor position
 -----------------------------------------------------
    mov ah, 09h    ; the 'function' number
    mov al, '='    ; the character to print
    ;  color  IRGBIRGB
    ;  bl color bits: intensity,red,green,blue,intensity,red,green,blue

    mov bl, 0b00000010   ; green on black at first page (bh=0)
    mov cx, 4      ; do it 4 times (cursor stays where it was)
    int 10h        ; do it with a bios interrupt
    jmp $          ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,

 * print 2 characters advancing cursor 
 -----------------------------------------------------
    mov cx, 1      ; number of characters to print 
    mov ah, 09h    ; bios function colour print 
    mov al, '='    ; the character to print
    ;  color  IRGBIRGB
    mov bl, 0b00000010   ; green on black
    int 10h        ; do it with a bios interrupt
    mov ah, 03h  ; bios function: get cursor position into dx  
    int 10h      ; invoke bios
    mov ah, 02h  ; bios function: set cursor position specified in dx
    inc dl       ; increment cursor column by 1
    int 10h      ; invoke bios
    mov cx, 1    ; number of characters to print 
    mov ah, 09h  ; bios function colour print 
    mov bl, 0b01101111
    int 10h      ; colour print another =
    jmp $          ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,

 The code below relies on the fact that the bios function
 AH=09h,BL=colour,CX=char-count does not update the cursor
 position after printing to the screen. So each iteration of the loop actually 
 overwrites n-1 characters of the previous iteration

 * print digits 0-9 in 9 different colours 
 -----------------------------------------
  start:
  mov cx, 0x0009
  .again
    mov ah, 09h    ; the 'function' number
    mov al, cl     ; the digit to print
    add al, '0'    ; convert the digit to ascii 
    mov bl, cl     ; use the CX counter to cycle thru 9 colours
    int 10h        ; do it with a bios interrupt
    loop .again
    jmp $          ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,

 * print digits 0-9 in 9 different colours in graphics mode
 -----------------------------------------
  start:

      mov ah, 0     ; set graphics display mode function.
      mov al, 13h    ; mode 13h 
      int 10h       ; set it!

  mov cx, 0x0009
  .again
    mov ah, 0Eh    ; the 'function' number
    mov al, cl     ; the digit to print
    add al, '0'    ; convert the digit to ascii 
    mov bl, cl     ; use the CX counter to cycle thru 9 colours
    int 10h        ; do it with a bios interrupt
    loop .again
    jmp $          ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,

 The code below uses a trick: on the 1st iteration 16 stars
 are printed in a colour, on the 2nd iteration 15 stars in a 
 different colour, but at the same location, thus overwriting
 all but the last of the 16 previous, and so on.

 * print 16 stars in 16 colours, or 15
 -----------------------------------------
  bigblock equ 219  ; ascii code for big block
  start:
  mov cx, 0x000F
  .again
    mov ah, 09h    ; the 'function' number
    mov al, '*'    ; the character to print a star 
    mov bl, cl     ; use the CX counter to cycle thru 16 colours
    int 10h        ; do it with a bios interrupt
    loop .again
    jmp $          ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,

 * print a whole screen of colourfull blocks
 -----------------------------------------

  [ORG 0]

    jmp 07C0h:start         ; Goto segment 07C0

  bigblock equ 219  ; ascii code for big block
  blocks:
    dw 0
    db 5, 'blocks'
  blocks.x:
    mov cx, 0x0FEF
    .again:
      mov ah, 09h    ; the 'function' number
      mov al, bigblock  ; just a colour block 
      mov bl, cl     ; use the CX counter to cycle thru 16 colours
      ;mov bx, cx
      ;mov bl, [bx]    ; get random data from memory for random colours 
      int 10h        ; do it with a bios interrupt
      loop .again
    ret

  start:
    mov ax, cs
    mov ds, ax
    call blocks.x
    jmp $          ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,



 The word 'type' in forth is so named because it automatically
 advances the text cursor like a teletyper. The code uses a 
 trick of writing the last letter first etc. The alternative to
 this is to update the cursor position after every character which
 is more work. In the code below if the string lenght is > 16 then
 background colours start to get printed. But this typer doesnt
 advance cursor

 * type some text in rainbow colours, forthstyle
 -----------------------------------------
  [ORG 0]

    jmp 07C0h:start         ; Goto segment 07C0

    buffer db 14, 'rainbowrainbow'

  ; (stack: text buffer addr - )
  typer:
    dw 0           ; link to next dict word or null
    db 5, 'typer'  ; fn counted name
  typer.h:
    pop ax              ; balance return ip
    pop si
    push ax
    xor bx, bx          ; bx := 0 bh := 0 so no background colours
    xor cx, cx          ; set cx:=0 
    mov cl, [si]        ; the character count, used by loop and colours
    add si, cx          ; set pointer to last char in string
    std                 ; make lodsb go in reverse
  .again:
    mov ah, 09h         ; the 'function' number
    mov bl, cl          ; use the CX counter to cycle thru 16 colours
    lodsb               ; get next char to al
    int 10h             ; do it with a bios interrupt
    loop .again
    ret

  start:
    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax
    push buffer
    call typer.h
    jmp $          ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature

 ,,,,


 * type some text in rainbow colours, forthstyle
 -----------------------------------------
  [ORG 0]

    jmp 07C0h:start         ; Goto segment 07C0

    buffer db 14, 'rainbowrainbow'

  ; (stack: text buffer addr - )
  typer:
    dw 0           ; link to next dict word or null
    db 5, 'typer'  ; fn counted name
  typer.h:
    xor bx, bx          ; bx := 0
    xor cx, cx          ; set cx:=0
    mov cl, [buffer]    ; the character count, used by loop and colours
  .again:
    mov ah, 09h         ; the 'function' number
    mov bl, cl          ; use the CX counter to cycle thru 16 colours
    mov al, [buffer+bx] ; print last char first n times, then overwrite
    int 10h             ; do it with a bios interrupt
    loop .again

    ret

  start:
    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax
    push buffer
    call typer.h
    jmp $          ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,

  == Basic Bios Colours 
  .. HEX  BIN     COLOUR
  .. 0, 0000  black
  .. 1  0001  blue
  .. 2  0010  green
  .. 3  0011  cyan
  .. 4  0100  red
  .. 5  0101  magenta
  .. 6  0110  brown
  .. 7  0111  light gray
  .. 8  1000  dark gray
  .. 9  1001  light blue
  .. A  1010  light green
  .. B  1011  light cyan
  .. C  1100  light red
  .. D  1101  light magenta
  .. E  1110  yellow
  .. F  1111  white
  ,,,


 * print "##" light blue on white at the current cursor position
 -----------------------------------------------------
    mov ah, 0x09        ; bios function colour print 
    mov al, '#'         ; character to print 
    mov bl, 0b11110001  ; blue on white background
    mov cx, 2           ; how many times to print character     
    int 10h             ; invoke bios function 
    jmp $               ; infinite loop 
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,

 * print a triangle of stars in 16 different colours
 -----------------------------------------------------
  start:
  mov cx, 0xF     ; 16 colours
  .again
    mov ah, 09h   ; the 'function' number for colour print
    mov al, '*'   ; the character to print
    mov bx, cx    ; colour in cx counter at first page (bh=0)
    int 10h       ; do it with a bios interrupt
    mov ah, 0eH   ; teletype function 
    mov al, 10    ; a form-feed 
    int 10h       ; do it
    loop .again
    jmp $         ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,

 * print a triangle of characters in 16 different colours
 -----------------------------------------------------

  [ORG 0]
    jmp 07C0h:start         ; Goto segment 07C0

  ; stack: - 
  tri.doc db 'a triangle of colourful characters'
            dw $-tri.doc
  tri:
    dw 0           ; link to next dict word or null
    db 3, 'tri'    ; counted name of function
  tri.x:
    mov cx, 0xF    ; 16 colours
  .again
    mov ah, 09h   ; the 'function' number for colour print
    mov al, cl    ; the character to print
    add al, 'A'-1 ; convert al to ascii A-O letters
    mov bx, cx    ; bl:1234 = bg colour, bl:5678 = fg colour 
                  ; colours bits: Intensity, Red, Green, Blue
                  ; bh: page number (?)
    int 10h       ; do it with a bios interrupt
    mov ah, 0eH   ; teletype function 
    mov al, 10    ; a form-feed goes to beginning of next line 
    int 10h       ; x86 real-mode bios interrupt 
    loop .again   ; loop while cx > 0
    ret

  start:

    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax

    call tri.x

    jmp $          ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,

 * print a triangle of characters in 16 different colours
 -----------------------------------------------------
  start:
  mov cx, 0xF     ; 16 colours
  .again
    push cx
    mov ah, 09h   ; the 'function' number for colour print
    mov al, 'Z'
    sub al, cl    ; the character to print
    mov bx, cx    ; colour in cx counter at first page (bh=0)
    mov cx, 1
    int 10h       ; do it with a bios interrupt
    mov dx, 0
    ;mov bh, 00h  ; assume page 0
    mov ah, 03h  ; get cursor position into dx  
    int 10h
    mov ah, 02h  ; set cursor position specified in dx
    inc dl       
    int 10h
    ;mov ah, 0eH   ; teletype function 
    ;mov al, 10    ; a form-feed 
    ;int 10h       ; do it
    pop cx
    loop .again
    jmp $         ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,

 * change whole screen colours to white on blue
 -----------------------------------------------------

  [ORG 0]
    jmp 07C0h:start         ; Goto segment 07C0

  ; stack: - 
  blue.doc db 'change video colours to white on blue'
            dw $-blue.doc
  blue:
    dw 0           ; link to next dict word or null
    db 4, 'blue'   ; fn counted name
  blue.x:
    mov ah, 0Bh    ;        
    mov bh, 0           
    mov bl, 11110001b   ; bl: white:blue  (Intensity Red Green Blue)
    int 10h             
    mov ah, 0eH        ; print some character
    mov al, '#'         
    int 10H
    ret

  start:

    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax

    call blue.x

    jmp $          ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature

 ,,,,

 * print blue then green, no !!! not working
 -----------------------------------------------------
    mov ah, 0Bh           
    mov bh, 0           
    mov bl, 00010000b    ; blue on black
    int 10h             
    mov ah, 0eH           ; print the character
    mov al, '#'         
    int 10h
    mov ah, 0Bh           
    mov bh, 0           
    mov bl, 00100000b    ; green on black
    int 10h             
    mov ah, 0eH           ; print the character
    mov al, '#'         
    int 10h
    jmp $   
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,


BACKGROUND COLOURS ....


 * print ribbons of background colours
 -----------------------------------------

  [ORG 0]
    jmp 07C0h:start         ; Goto segment 07C0

  ; stack: - 
  patch.doc db 'displays a columns of colour'
            dw $-patch.doc
  patch:
    dw 0           ; link to next dict word or null
    db 5, 'patch'  ; fn counted name
  patch.x:
    mov cx, 0x05FF
  .again:
    mov ah, 09h    ; the 'function' number
    mov al, ' '    ; the character to print a space
    mov bl, cl     ; use the CX counter to cycle thru 16 colours
    shl bl, 4      ; the 4 top bits are background colour
    int 10h        ; x86 bios interrupt
    loop .again
    ret

  start:

    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax

    call patch.x

    jmp $          ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature

 ,,,,


 * print 16 background colours
 -----------------------------------------
  start:
  mov cx, 0x00FF
  .again
    mov ah, 09h    ; the 'function' number
    mov al, ' '    ; the character to print a space
    mov bl, cl     ; use the CX counter to cycle thru 16 colours
    int 10h        ; do it with a bios interrupt
    loop .again
    jmp $          ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,

  
ARTFUL COLOURS ....

 * print 16 colours, with 16 backgrounds
 -----------------------------------------
  start:
  mov cx, 0x00FF
  .again
    mov ah, 09h    ; the 'function' number
    mov al, '*'    ; the character to print a star 
    mov bl, cl     ; use the CX counter to cycle thru 16 colours
    int 10h        ; do it with a bios interrupt
    loop .again
    jmp $          ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,

 * print some ascii digit in colour by advancing cursor 
 -----------------------------------------------------
  jmp start
  start:
  mov cx, 9
  .again:
    push cx
    mov ah, 09h    ; bios function colour print 
    mov al, cl     ; the digit to print
    add al, '0'    ; convert digit to ascii
    mov bl, cl     ; colour in counter CX 
    mov cx, 1      ; number of characters to print 
    int 10h        ; invoke bios 
    mov ah, 03h  ; bios function: get cursor position into dx  
    int 10h      ; invoke bios
    mov ah, 02h  ; bios function: set cursor position specified in dx
    inc dl       ; increment cursor column by 1
    int 10h      ; invoke bios
    pop cx
    loop .again
    jmp $          ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,

 * print all ascii chars in colour by advancing cursor 
 -----------------------------------------------------
  [ORG 0]
    jmp 07C0h:start         ; Goto segment 07C0

    ; stack: - 
    asci.doc db 'colourful ascii in 16 columns'
              dw $-asci.doc
    asci:
      dw 0           ; link to next dict word or null
      db 4, 'asci'  ; fn counted name
    asci.x:
      mov cx, 0x00FF 
    .again:
      push cx
      mov ah, 09h    ; bios function colour print 
      mov al, 0xFF   ; print ascending order
      sub al, cl     ; the ascii char to print
      mov bl, cl     ; colour in counter CX 
      and bl, 0x0F   ; only print foreground colours
      mov cx, 1      ; number of characters to print 
      int 10h        ; invoke bios 
      mov ah, 03h  ; bios function: get cursor position into dx  
      int 10h      ; invoke bios
      mov ah, 02h  ; bios function: set cursor position specified in dx
      inc dl       ; increment cursor column by 1
      int 10h      ; invoke bios
      pop cx
      test cl, 0b00001111  ; 32 characters to a line
      jne .here
       mov ah, 0eH   ; bios 'teletype' function
       mov al, 10    ; form feed char
       int 10H       ; invoke bios 
       mov al, 13    ; return char
       int 10H       ; invoke bios 
    .here:
      loop .again
      ret

  start:

    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax

    call asci.x

    jmp $          ; loop forever
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature

 ,,,,

CHESS 

  The start of a chess engine written in a forth style

  * print one chess square algebraically 
  ---------------
   [org 0]
   jmp 07C0h:start

   chess.board:
     db 0, 0, 0, 0, 0, 0, 0, 0
     db 0, 0, 0, 0, 0, 0, 0, 0
     db 0, 0, 0, 0, 0, 0, 0, 0
     db 0, 0, 0, 0, 0, 0, 0, 0
     db 0, 0, 0, 0, 0, 0, 0, 0
     db 0, 0, 0, 0, 0, 0, 0, 0
     db 0, 0, 0, 0, 0, 0, 0, 0
     db 0, 0, 0, 0, 0, 0, 0, 0

   printsquare.doc:
     db 'Prints in algebraic notation one chess square, given an offset'
     db 'takes offset on stack as parameter. Squares are 0-63, with '
     db 'a1==0, b1==1, a2==8 ...h8=63 '
     dw $-printsquare.doc
   printsquare:
     dw 0            ; link to previous
     db 11, 'printsquare'
   printsquare.x:
     pop dx    ; juggle return fn ip
     pop ax    ; get offset into ax
     push dx   ; restore return
     mov bl, 8 ; could be done faster with AND 0b00000111 etc
               ; but printing squares isnt time critical
     div bl    ; al:=quotient, ah:=remainder eg 1r2
     push ax   ; save quotient/remainder
     mov al, ah     ; remainder -> al for printing
     add al, 'a'    ; convert column to chess column (a-h)
     mov ah, 0x0e   ; int 10h 'print char' function
     int 10h        ; print char in al
     pop ax    ; restore quotient/remainder
     mov ah, 0x0e   ; int 10h 'print char' function
     add al, '1'    ; convert row to ascii digit
     int 10h        ; print char in al
     ret

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     push 63      ; eg h8  
     call printsquare.x
     push 0       ; eg a1  
     call printsquare.x
     push 11      ; eg d2  
     call printsquare.x

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  
  * a sketch of how to write a chess engine in a forth style
  ---------------

  BITS 16
  [ORG 0]

   jmp 07C0h:load    ; Goto segment 07C0
     drive db 0      ; a variable to hold boot drive number
   load:
     mov ax, cs     ; the code segment is already correct (?!)
     mov ds, ax     ; set up data and extended segments
     mov es, ax
     mov [drive], dl ; save the boot drive number
     mov ax, 07C0h   ; Set up 4K stack space after this bootloader
     add ax, 288     ; (4096 + 512) / 16 bytes per paragraph
     mov ss, ax      ; with a 4K gap between stack and code
     mov sp, 4096

      ; save the DL register or else dont modify it
      ; it contains the number of the boot medium (hard disk,
      ; usb memory stick etc)
      ; The 'floppy' Drive is NOT necesarily 0!!!

    reset:            ; Reset the floppy drive
      mov ax, 0       ; 
      mov dl, [drive] ; the boot drive number (eg for usb 128)
      int 13h         ;
      jc reset        ; ERROR => reset again
    read:
      mov ax, 1000h       ; ES:BX = 1000:0000
      mov es, ax          ; es:bx determines where data loaded to
      mov bx, 0           ;
      mov ah, 2           ; Load disk data to ES:BX
      mov al, 8           ; Load 5 sectors ie 512 bytes * 5 == 2.5K  
      ; try mov cx, 0x0002 ; cylinder 0, sector 2
      mov ch, 0           ; Cylinder=0
      mov cl, 2           ; Sector=2 (sector 1 is the boot sector)
      mov dh, 0           ; Head=0
      mov dl, [drive]     ; 
      int 13h             ; Read!
    jc read             ; ERROR => Try again

    jmp 1000h:0000      ; Jump to the loaded code 

    times 510-($-$$) db 0   ; pad out the boot sector (512 bytes)
    dw 0AA55h               ; end with standard boot signature

    ; this below is the magic line to make the new memory offsets
    ; work. Or compile the 2 files separately
    ; https://forum.nasm.us/index.php?topic=2160.0 

    section stage2 vstart=0

    jmp start

    ; the code to be loaded and executed
    ; cs is ok because of far jump
    ; is ds and es ok ? no, but stack seems ok
   
   ; a list of chars which can be used to print the pieces for moves
   piece.char db '0PNBRQK   |'   ; first is empty, 10 is off the side
   ; characters for text board
   ;piece.render db '.PNBRQK   |'   ; first is empty, 10 is off the side
   piece.render db '.', 0x1E, 0xA4, 0x06, 0xCB, 'Q', 0x0B
                db '   |' 

   pawnglyph equ 0x1E
   kingglyph equ 0x0C
   rookglyph equ 0x01

   ; symbolic constants for pieces. same or as vectors
   ; we can get the offset from "pieces:" to get its appropriate "vector"

   pieces:
   empty equ 0
   pawnw equ 1
   knightw equ 2
   bishopw equ 3
   rookw equ 4   
   queenw equ 5
   kingw equ 6
   side equ 10

   ; pawnb - 8 will give pawnw, if this is useful
   pawnb equ 9
   knightb equ 10 
   bishopb equ 11 
   rookb equ 12 
   queenb equ 13 
   kingb equ 14 

   ; The vectors are what we can add to SI from-square
   ; to get DI to-square
   ; we need the vector table to look up the right vector
   ; the first element of the vectors is a count

   vectortable 
     dw 6     ; a count, makes piece symbolic value match table
     dw pawn.vector, knight.vector, bishop.vector  
     dw rook.vector, queen.vector, king.vector

   ; vectors are zero terminated for convenience offsets start at "north" and 
   ; go clockwise. These offset are for a 12x8 board. The board being 12 bytes
   ; wide makes the "off side of board" check very efficient.

   pawn.vector dw 12, 0  ; and '20' but only on rank 2
   knight.vector dw 25, 14, -10, -23, -25, -14, 10, 23, 0 
   bishop.vector dw 13, -11, -13, 11, 0
   rook.vector dw 12, 1, -12, -1, 0
   queen.vector dw 12, 13, 1, -11, -12, -13, -1, -11, 0  ; 1st is count
   king.vector dw 12, 13, 1, -11, -12, -13, -1, -11, 0  ; 1st is count

   
   board.doc:
     db 'puts a pointer to the board data structure onto the stack '
     db ' board contains, turn, history, score and squares'
     dw $-board.doc
   board:
     dw 0            ; link to previous
     db 5, 'board'
   board.x:
     pop dx          ; juggle fn return pointer
     push board.data
     push dx
     ret

   board.data:
   turn: dw 0       ; who to play, white 0 or black 1
   history:         ; this is the depth search
     dw 0           ; count of moves
     dw 0,0,0,0,0   ; array of from-square, to-square
   score: dw 0      ; current board score
   squares:
     db side, side, 0, 0, rookw, 0, 0, 0, pawnw, 0, side, side
     db side, side, 0, 0, 0, 0, 0, 0, 0, 0, side, side
     db side, side, 0, 0, knightw, 0, 0, 0, 0, 0, side, side
     db side, side, 0, 0, 0, 0, 0, 0, 0, 0, side, side
     db side, side, 0, 0, 0, 0, 0, 0, 0, 0, side, side
     db side, side, 0, 0, 0, 0, knightw, 0, 0, 0, side, side
     db side, side, 0, 0, 0, 0, 0, 0, 0, 0, side, side
     db side, side, 0, 0, pawnw, 0, kingw, 0, 0, 0, side, side

   ;1E triangle pawn
   ;0C female simbol king

   printpiece.doc:
     db 'just prints asci version of piece value for debug'
     dw $-printpiece.doc
   printpiece:
     dw board            ; link to previous
     db 10, 'printpiece'
   printpiece.x:
     pop dx
     pop si    ; piece symbolc value (1-7 etc)
     push dx   ; restore fn

     add si, piece.char   ; 0=empty, 1=pawn ..., 10=side
     lodsb
     mov ah, 0x0e
     int 10h
     ret

   printvector.doc:
     db 'just print the piece vectors for debugging' 
     db ' takes symbolic piece value on the stack '
     db ' - modify to loop until zero termination '
     dw $-printvector.doc
   printvector:
     dw printpiece            ; link to previous
     db 11, 'printvector'
   printvector.x:
     ; just print each vector
     pop dx
     pop bx         ; piece symbolic value
     push dx
     shl bx, 1  ; bl := bl*2 , since vectortable pointer is word not byte
     mov si, [vectortable+bx] ;  

     ;xor cx, cx     ; set cx = 0, so that count in cl works
     ;lodsb          ; get count into al. al := [di]++ 
     ;mov cl, al     ; vector count into cl

     mov ah, 0x0e   ; int 10h 'print char' function
     mov al, '>'    ; some start char
     int 10h        ; print char in al
   .next:
     lodsw          ; get next vector into al
     cmp ax, 0      ; zero terminated vectors
     je .exit       ; exit if at end of piece vector
     test ax, ax    ; see if ax is negative
     jns .print     ; if not negative just skip
     push ax        ; save move offset
     mov ah, 0x0e   ; int 10h 'print char' function
     mov al, '-'    ; print a negative indicator
     int 10h        ; print char in al
     pop ax         ; restore move offset 
     neg ax         ; make positive for printing 
   .print:
     ; piece jumps are at most 2 decimal digits so we 
     ; will assume here that the quotient after div 10
     ; is the 1st digit
     mov bl, 10     ; divide by 10
     ;xor ah, ah     ; this is necessary !! not sure why
     div bl         ; do ax/10, ah=remainder, al=quotient
     mov bl, ah     ; just save ah
     mov ah, 0x0e   ; int 10h 'print char' function
     cmp al, 0      ; if 1st digit is zero, dont print it
     je .second
     add al, '0'    ; convert 1st digit to ascii
     int 10h        ; print char in al (quotient)
   .second:
     mov al, bl     ; print second digit
     add al, '0'    ; convert 2nd digit to ascii
     int 10h        ; print char in al
     mov al, ' '    ; separator for vectors
     int 10h        ; print char in al
     jmp .next
   .exit:
     ret

   printsquare.doc:
     db 'Prints in algebraic notation one chess square, given an offset'
     db 'takes offset on stack as parameter. valid Squares are 0-95 with'
     db ' valid squares 2-8, 14-20, 26-32 etc, '
     db ' all other offsets being off the side of the board. '
     db 'also prints piece on the square '
     dw $-printsquare.doc
   printsquare:
     dw printvector          ; link to previous
     db 11, 'printsquare'
   printsquare.x:
     pop dx    ; juggle return fn ip
     pop ax    ; get offset into ax
     push dx   ; restore return
     push ax   ; save square offset
            
     ; this is hacked together but demostrates an important 
     ; technique. ie. getting the piece by value on a square on 
     ; the board
     xor bx, bx  ; set bx = 0.
     mov bx, ax  ; get square offset into bx
     mov bl, [squares+bx]   ; get the piece value on square into bx
     mov al, [piece.char+bx]
     mov ah, 0x0e   ; int 10h 'print char' function
     int 10h        ; print char in al

     pop ax    ; restore square offset

     ; valid squares 2-8, 14-20, 26-32 etc, 
     ; technique is subtract 12 repeatedly from offset, counting
     ; loops. the count is the rank and 2-9 offset is the file

     xor cx, cx    ; set loop counter = 0
     .again:
       inc cx
       cmp ax, 10
       jb .print
       sub ax, 12
     jmp .again
     .print:
       add al, 'a'-2  ; convert column to chess column (a-h)
       mov ah, 0x0e   ; int 10h 'print char' function
       int 10h        ; print char in al
       mov al, cl     ; board rank (1-8)
       add al, '0'    ; convert to asci digit
       int 10h        ; print char in al

     ret

   printmove.doc:
     db 'Prints in algebraic notation one chess move, given square offsets'
     db 'on the stack as parameters. [tos: to, from] Squares are 0-95, with '
     db 'a1==2, b1==3, h1==8 ...h8=63 '
     db ' need to add capture X to move printout'
     dw $-printmove.doc
   printmove:
     dw 0            ; link to previous
     db 9, 'printmove'
   printmove.x:
     pop dx    ; juggle return fn ip
     pop bx    ; get to-square offset (0-63) into ax
     pop ax    ; get from-square offset (0-63) into bx
     push dx   ; restore return

     push bx     ; save bx (to-square)
     push ax
     xor bx, bx  ; set bx = 0.
     mov bx, ax  ; get square offset into bx
     mov bl, [squares+bx]   ; get the piece value on square into bx
     mov al, [piece.char+bx]
     mov ah, 0x0e   ; int 10h 'print char' function
     int 10h        ; print char in al
     pop ax
     pop bx         ; restore to-square

     xor cx, cx    ; set loop counter = 0
     .again:
       inc cx
       cmp ax, 10  ; 9 is last square on 1st rank
       jb .print
       sub ax, 12
     jmp .again
     .print:
       add al, 'a'-2  ; convert column to chess column (a-h)
       mov ah, 0x0e   ; int 10h 'print char' function
       int 10h        ; print char in al
       mov al, cl     ; board rank (1-8)
       add al, '0'    ; convert to asci digit
       int 10h        ; print char in al

     mov ax, bx       ; get to-square into ax
     push ax          ; save ax (to-square offset)

     xor bx, bx  ; set bx = 0.
     mov bx, ax  ; get square offset into bx
     mov bl, [squares+bx]   ; get the piece value on square into bx
     
     mov al, '-'      ; default move separator (no capture) 
     cmp bl, side     ; check for move off the side of the board
     jne .empty
     mov al, '?'      ; illegal move indicator
     jmp .separator
   .empty:
     cmp bl, 0      ; is to-square empty?
     je .separator
     mov al, 'x'    ; capture indicator
   .separator:
     mov ah, 0x0e   ; int 10h bios 'print char' function
     int 10h        ; print square separator al
     pop ax         ; restore to-square offset

     xor cx, cx     ; set loop counter = 0
     .againx:
       inc cx
       cmp ax, 10  ; 9 is last square on 1st rank
       jb .printx
       sub ax, 12
     jmp .againx
     .printx:
       add al, 'a'-2  ; convert column to chess column (a-h)
       mov ah, 0x0e   ; int 10h 'print char' function
       int 10h        ; print char in al
       mov al, cl     ; board rank (1-8)
       add al, '0'    ; convert to asci digit
       int 10h        ; print char in al

     mov al, ' '    ; print - as separator 
     int 10h        ; print char in al
     
   .exit:
     ret

   
   textboard.doc:
     db 'Prints an asci chess board with a1 in the lower left hand'
     db 'corner. So print a8 -> h8, then a7 -> h7 etc '
     db 'maybe make custom glyphs and write them into bios asci '
     db 'memory to make a nicer text board'
     db ' also an asci box around this would be nice'
     dw $-textboard.doc
   textboard:
     dw printmove          ; link to previous
     db 9, 'textboard'
   textboard.x:
     mov dx, 86
   .nextrow: 
     mov cx, 8            ; 8 squares per row
     mov bx, piece.render ; translation table
     mov si, squares      ; square a8 on 12x8 board
     add si, dx           ; start of next row, eg 86, 74, 62, ... 14, 2
     mov ah, 0x0e         ; print char
   .nextsquare:
     lodsb                ; get 1st square into al
     xlatb                ; replace al with char in piece.chars table
     int 10h
     loop .nextsquare
     mov al, 13           ; print new line
     int 10h
     mov al, 10
     int 10h
     sub dx, 12
     cmp dx, 0
     jg .nextrow 
   .exit:
     mov cx, 8         ; print algebraic file labels a-h 
   .letters:
     mov al, 'i'       ; 'h'+1, last file
     sub al, cl
     mov ah, 0x0e      ; print char
     int 10h
     loop .letters

     ret 

   twodigit.doc:
     db 'just prints a signed 2 digit number in decimal'
     dw $-twodigit.doc
   twodigit:
     dw textboard            ; link to previous
     db 6, '2digit'
   twodigit.x:
     pop dx         ; balance return pointer
     pop ax         ; get signed 2 digit number into ax 
     push dx        ; restore
   .next:
     cmp ax, 99     ; if its > 99 we cant show it here
     jg .toobig
     cmp ax, -99    ; if its < -99 we cant show it here
     jl .toosmall
     test ax, ax    ; see if al is negative
     jns .print     ; if not negative just skip
     push ax        ; save ah and al
     mov ah, 0x0e   ; int 10h 'print char' function
     mov al, '-'    ; print a negative indicator
     int 10h        ; print char in al
     pop ax         ; restore ah and al (the digit)
     neg ax         ; make positive for printing 
   .print:
     ; we will assume here that the quotient after div 10
     ; is the 1st digit
     mov bl, 10     ; divide by 10
     ;xor ah, ah     ; this is necessary !! not sure why
     div bl         ; do ax/10, ah=remainder, al=quotient
     mov bl, ah     ; just save ah
     mov ah, 0x0e   ; int 10h 'print char' function
     cmp al, 0      ; if 1st digit is zero, dont print it
     je .second
     add al, '0'    ; convert 1st digit to ascii
     int 10h        ; print char in al (quotient)
   .second:
     mov al, bl     ; print second digit
     add al, '0'    ; convert 2nd digit to ascii
     int 10h        ; print char in al
     mov al, ' '    ; separator for vectors
     int 10h        ; print char in al
   .exit:
     ret
   .toobig:
     mov ah, 0x0e   ; int 10h 'print char' function
     mov al, '!'    ; show a message indicating number to big 
     int 10h        ; print char in al
     mov al, '>'    ; 
     int 10h        ; print char in al
     mov al, ' '    ; 
     int 10h        ; print char in al
     ret
   .toosmall:
     mov ah, 0x0e   ; int 10h 'print char' function
     mov al, '!'    ; a message that number is too small (< -99) 
     int 10h        ; print char in al
     mov al, '<'    ; 
     int 10h        ; print char in al
     mov al, ' '    ; 
     int 10h        ; print char in al
     ret

   onepiece.doc:
     db 'finds all legal moves for one piece on a chess board'
     db 'Use bx as vector pointer. si as start square, di as end square.' 
     db ' still need to deal with black/white logic '
     dw $-onepiece.doc
   onepiece:
      dw twodigit       ; link to last fn
      db 6, '1piece'
   onepiece.x:
      ; now for next piece logic here, but put it in a 
      ; different function eg "allmoves". scan squares to find next
      ; piece of white or black. could use bp as a square pointer?
      
      pop dx 
      pop si         ; the piece square (0-95)
      push dx        ; restore fn return pointer
      xor bx, bx

      mov bl, [squares+si]    ; get the piece value on square into bx
      cmp bl, 0               ; empty square so find a piece 
      je .exit
      ;push bx
      ;call printvector.x
      ;ret
      shl bl, 1  ; bl := bl*2 , since vectortable pointer is word not byte
      mov bx, [vectortable+bx]  ; get piece vector

      sub bx, 2       ; otherwise will skip the first vector 
    .turn:            ; next direction for piece moves
      add bx, 2       ; bx is word pointer in move offset vector
      push si
      pop di             ; set si := di  (from-square == to-square)
      mov dx, [bx]       ; get move offset from piece vector
      cmp dx, 0          ; vectors are zero terminated 
      je .exit           ; no more turns, so exit or nextpiece

    .nextmove:
      add di, dx     ; eg 2 + 8 (rook move)

      ; find out what di is
      ;pusha 
      ;push di
      ;call twodigit.x
      ;popa
      
                   ; check legality etc  
      cmp di, 95   ; is move off top of board?
      jg .turn     ; move off board so try next move offset 

      cmp di, 0    ; is move off bottom of board?
      jl .turn     ; move off board so try next move offset 

      mov al, [squares+di]  ; get piece on to-square
      cmp al, 0          ; if yes move legal no turn
      je .showmove 
      cmp al, side       ; off the side of the board so illegal 
      je .turn           ; 
      cmp al, 8
      jl .turn           ; own piece so illegal move

      ; else opposition piece so legal move, but must turn
      ; again

    .showmove:
      inc cx    ; increment a move counter
      pusha     ; conserve all general purpose regs, since printmove mods them
      push si   ; from-square
      push di   ; to-square
      call printmove.x
      popa

      mov al, [squares+di]  ; get piece on from-square
      cmp al, 8       ; if destination square is opposition then must turn
      jg .turn

      ; logic to turn when it is a knight or king etc
      ; that is a piece that can only move one square

      mov al, [squares+si]    ; get the piece value on square into al
      cmp al, knightw         ; knights must turn 
      je .turn
      cmp al, kingw           ; kings must turn 
      je .turn
      cmp al, pawnw           ; pawns must turn ?
      je .turn

      jmp .nextmove  

    .exit:
      ;push si
      ;call printsquare.x
      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax

     ;mov [bos.n], sp   ; set up stack

     call textboard.x
     ;push 4      ; c1 - from-square 
     ;push 28     ; g2 - to-square. eg h1-g2
     ;call printmove.x

     push 16      ; c1 (with a rook on it )
     ;call printsquare.x
     push 17      ; d1 (with a rook on it )
     ;call printsquare.x

     push 4      ; c1 - from-square 
     ;call onepiece.x

    jmp $                   ; halt here

  ,,,


USEFULL PROCEDURES

  This section contains a set of hopefully useful proceedures

  * print a zero terminated string with address in the SI register 
  -----------------
   BITS 16

   jmp start
   message db 'A function to print',13,10,0
   start:
     mov ax, 07C0h    ; Set up 4K stack space after this bootloader
     add ax, 288      ; (4096 + 512) / 16 bytes per paragraph
     mov ss, ax
     mov sp, 4096
     mov ax, 07C0h    ; Set data segment to where we're loaded
     mov ds, ax

    mov si, message     ; Put string position into SI
    call prints          ; Call our string-printing routine

    hang: jmp hang           ; Jump here - infinite loop!

  ;# prints
  ;   output zero terminated string in SI to screen
  prints:      
    mov ah, 0Eh       ; int 10h 'print char' function

  .again:
    lodsb             ; Get character from string
    cmp al, 0
    je .done          ; If char is zero, end of string
    int 10h           ; Otherwise, print it
    jmp .again

  .done:
    ret

  times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
  dw 0xAA55               ; The standard PC boot signature
 ,,,

 * a proceedure to print a character in colour and advance cursor 
 -----------------------------------------------------
  start:
    mov al, 'G'
    mov cx, 0xF
  .again:
    mov bl, cl     ; some colour
    call putcc
    loop .again
    jmp $         ; loop forever

  ; proc: print a coloured character (in AL) and colours (in BL)
  putcc:
    push ax
    push bx
    push cx
    push dx
    mov bh, 0     ; assume we are working in the first page 
    mov ah, 09h   ; the 'function' number for colour print
    mov cx, 1     ; print the character once
    int 10h       ; do it with a bios interrupt
    mov ah, 03h   ; get cursor position into dx  
    int 10h
    mov ah, 02h   ; set cursor position function 
    inc dl        ; increment the column position 
    int 10h
    pop dx
    pop cx
    pop bx
    pop ax
  ret

    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,

 The code below needs to perform a modulus on BL to make
 the colours cycle through the 16 allowable text mode colours

 * a procedure to 'rainbow' print some text (each letter a new colour) 
 -----------------------------------------------------
  jmp start
  message db '8086 in realmode rainbow!@#$%^&*', 0
  start:
    mov ax, 07C0h    ; Set up 4K stack space after this bootloader
    add ax, 288      ; (4096 + 512) / 16 bytes per paragraph
    mov ss, ax
    mov sp, 4096

    mov ax, 07C0h    ; Set data segment to where we're loaded
    mov ds, ax

    mov si, message    ; Put string position into SI
    call printcolour 
    jmp $         ; loop forever

  ; proc: print a string in rainbow colour 
  ; string address in SI
  printcolour:
   push bx
   .resetcolour:
    mov bl, 1      ; colours start from 1 because 0 is black
   .repeat:
     lodsb         ; Get character from string
     cmp al, 0     ; is the character byte 0
     je .done      ; If char is zero, end of string
     call putcc    ; Otherwise, print it in colour
     cmp bl, 15    ; if bl is at the last colour reset it
     je .resetcolour
     inc bl 
     jmp .repeat
   .done:
   pop bx
  ret

  ; print a coloured character (in AL) and colours (in BL)
  putcc:
    push ax       ; save registers to the stack
    push bx
    push cx
    push dx
    mov bh, 0     ; assume we are working in the first page 
    mov ah, 09h   ; the 'function' number for colour print
    mov cx, 1     ; print the character once
    int 10h       ; do it with a bios interrupt
    mov ah, 03h   ; get cursor position into dx  
    int 10h
    mov ah, 02h   ; set cursor position function 
    inc dl        ; increment the column position 
    int 10h
    pop dx
    pop cx
    pop bx
    pop ax
  ret
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
 ,,,,

READING FROM DISKS

  The AH=02h, int 13h function allows reading from a 'floppy'
  (or usb emulating a floppy) or a hard drive. This is probably
  the most important function of a 'bootloader', that is, it
  must load something (code) from the disk in order to overcome
  the 1 sector (512byte) limit of the bootsector.

  Some pundits say that resetting and reading should be tried
  3 times. In the case of a real floppy the first read may not
  work because the device takes some time to 'spin up' etc.
  These factors should not apply to a usb memory stick.

  * read and print some text which is in the 2nd sector 
  -----------------
   BITS 16

   start:
      mov ax, 07C0h   ; Set up 4K stack space after this bootloader
      add ax, 288     ; (4096 + 512) / 16 bytes per paragraph
      mov ss, ax
      mov sp, 4096
      mov ax, 07C0h   ; Set data segment to where we're loaded
      mov ds, ax

    reset:            ; Reset the floppy drive
      mov ax, 0       ;
      mov dl, 0       ; Drive=0 (=A)
      int 13h         ;
      jc reset        ; ERROR => reset again
    read:
      mov ax, 1000h       ; ES:BX = 1000:0000
      mov es, ax          ; es:bx determines where data loaded to
      mov bx, 0           ;
      mov ah, 2           ; Load disk data to ES:BX
      mov al, 5           ; Load 5 sectors
      mov ch, 0           ; Cylinder=0
      mov cl, 2           ; Sector=2
      mov dh, 0           ; Head=0
      mov dl, 0           ; Drive=0, 'floppy' (or usb key)
      int 13h             ; Read!
    jc read             ; ERROR => Try again

      mov al, [es:bx]   ; print 2 characters loaded
      mov ah, 0eh
      int 10h
      mov al, [es:bx+1]
      int 10h
      
    hang: jmp hang
    times 510-($-$$) db 0
    dw 0AA55h
    data db 'some sample data',0 
  ,,,

  * read 1 sector into a variable in the data sector 
  -----------------
   BITS 16
   jmp start
   start:
      mov ax, 07C0h   ; Set up 4K stack space after this bootloader
      add ax, 288     ; (4096 + 512) / 16 bytes per paragraph
      mov ss, ax
      mov sp, 4096
      mov ax, 07C0h   ; Set data segment to where we're loaded
      mov ds, ax

    reset:            ; Reset the floppy drive
      mov ax, 0       ;
      mov dl, 0       ; Drive=0 (=A)
      int 13h         ;
      jc reset        ; ERROR => reset again
    read:
      mov ax, ds      ; ES:BX = this data segment, message variable
      mov es, ax      ; es:bx determines where data loaded to
      mov bx, message ; load into the 'message' variable buffer
      mov ah, 2       ; Load disk data to ES:BX
      mov al, 1       ; Load 1 sector, 512 bytes
      mov ch, 0       ; Cylinder=0
      mov cl, 2       ; Sector=2
      mov dh, 0       ; Head=0
      mov dl, 0       ; Drive=0, 'floppy' (or usb key)
      int 13h         ; Read!
    jc read           ; ERROR => Try again

      mov ah, 0eh
      mov al, [message]  ; print 2 characters loaded
      int 10h
      mov al, [message+1]
      int 10h
      
    hang: jmp hang
    times 510-($-$$) db 0
    dw 0AA55h
    data db 'loaded data',0 
    message times 512 db 0
  ,,,

  * read 1 sector and print out the string data
  -----------------
   BITS 16
   jmp start
   message.reset db 'resetting the floppy',13,10,0
   message.read  db 'reading 1 sector',13,10,0
   start:
      mov ax, 07C0h   ; Set up 4K stack space after this bootloader
      add ax, 288     ; (4096 + 512) / 16 bytes per paragraph
      mov ss, ax
      mov sp, 4096
      mov ax, 07C0h   ; Set data segment to where we're loaded
      mov ds, ax

    reset:            ; Reset the floppy drive
      mov si, message.reset
      call prints
      mov ax, 0       ;
      mov dl, 0       ; Drive=0 (=A)
      int 13h         ;
      jc reset        ; ERROR => reset again
    read:
      mov si, message.read
      call prints
      mov ax, ds      ; ES:BX = this data segment, message variable
      mov es, ax      ; es:bx determines where data loaded to
      mov bx, message ; load into the 'message' variable buffer
      mov ah, 2       ; Load disk data to ES:BX
      mov al, 1       ; Load 1 sector, 512 bytes
      mov ch, 0       ; Cylinder=0
      mov cl, 2       ; Sector=2
      mov dh, 0       ; Head=0
      mov dl, 0       ; Drive=0, 'floppy' (or usb key)
      int 13h         ; Read!
    jc read           ; ERROR => Try again

      mov si, message
      call prints

    hang: jmp hang
    %include 'prints.asm'
    times 510-($-$$) db 0
    dw 0AA55h
    data db 'loaded data',0 
    message times 512 db 0

  ,,,

WRITING TO DISKS

   The bios contains functions (under INT 13h) for writing
   to the 'floppy' disk (nowdays a usb memory stick which
   is emulating a floppy) or to a hard disk. We must be VERY
   thoughtful when writing to a hard disk, or we will end
   up with the computer completely unusable!!!!!. 
   The same applies to the floppy
   but perhaps the consequences are less catastrophic.

   * write to hard disk, dont do it!!! you wont have a working comp
   ----------
     xor ax, ax
     mov es, ax    ; ES <- 0
     mov cx, 1     ; cylinder 0, sector 1
     mov dx, 0080h ; DH = 0 (head), drive = 80h (0th hard disk)
     mov bx, 5000h ; segment offset of the buffer
     mov ax, 0301h ; AH = 03 (disk write), AL = 01 (number of sectors to write)
     ;int 13h

   ,,,

   The code below should check that we are not writing to 
   a hard disk (eg DL=80h) because doing so will probably 
   render the computer unusable at all!

   * write to the boot medium (a usb stick hopefully) 
   ----------
   ; see the read disk section for some better code
   ; to do this
   xor ax, ax
   mov es, ax    ; ES := 0
   mov cx, 1     ; cylinder 0, sector 1
   mov dh, 0     ; head 0
   mov dl, 0     ; 1st floppy but not usb memory stick
   mov bx, 5000h ; segment offset of the buffer
   mov ah, 03    ; disk write
   mov al, 01    ; write only 1 sector (512 bytes)
   int 13h
   ,,,

WRITE TO FLOPPY OR USB ....

  The happy answer is that a simple technique allows the same boot sector
  code to access a floppy disk image on a USB flash drive whether it was
  booted with floppy disk emulation or hard drive emulation. If dl=80h
  (hard drive emulation)

  * get drive parameters
  -------------
    int 13h, ah=8
    Return:
    ch=maximum sector number (same as number of sectors per track)
    dh=maximum head number (just add 1 to get number of heads)
  ,,,

  This returned information describes the geometry of the emulated device
  (if dl=0 then it's standard floppy disk geometry - 18 sectors per track
  and 2 heads). This can be used to calculate the required Cylinder Head
  Sector information required for:

  READ SECTOR(S)
  int 13h, ah=2

  WRITE SECTOR(S)
  int 13h, ah=3

CMOS

  http://wiki.osdev.org/CMOS
    good cmos and realtime clock information

  http://vitaly_filatov.tripod.com/ng/asm/asm_029.3.html
    more timer info.

REAL TIME CLOCK RTC ....

  http://stackoverflow.com/questions/3215878/what-are-in-out-instructions-in-x86-used-for
     excellent low level device examples in assembler

  https://github.com/cirosantilli/x86-bare-metal-examples/blob/9a24f92f36a45abb3f8c37aafc0c3ee9b15563ab/in_rtc.S
    complete assembler example

  The real time clock keeps track of the time even when the 
  computer is turned off. It is located on the cmos chip of
  x86 computers.

  Reading the rtc information from the cmos is simple. just send a 
  register select to port address 0x70 and read the answer from 0x71.
  
  But we should check that an update is not in progress. So we can
  block while the highest bit of 0x0A register is set

  * eg
  -------
    while
      out_byte(0x70, 0x0A);
      in_byte(0x71) & 0x80 
      / eg in 0x71; test 
      is true, block
  ,,,


  Another trick is to check the values twice in succession and 
  wait for equal results to be returned to make sure garbage is
  not being read.

  * check if the cmos rtc uses binary code decimal results
  ----------
     out_byte(0x70, 0x0B);
     in_byte(0x71) & 0x40 
     is true, then not BCD, otherwise must convert
  ,,,

  * converting BCD binary coded decimal to binary
  ------------------
    second = (second & 0x0F) + ((second / 16) * 10);
    all the same, except hour
    hour = ( (hour & 0x0F) + (((hour & 0x70) / 16) * 10) ) | (hour & 0x80);
  ,,,

  * get rtc info from the cmos
  --------------
    out_byte(0x70, 0x00);   second
    in_byte(0x71);
    out_byte(0x70, 0x02);   minute
    in_byte(0x71);
    out_byte(0x70, 0x04);   hour 
    in_byte(0x71);

    etc
    day = get_RTC_register(0x07);
    month = get_RTC_register(0x08);
    year = get_RTC_register(0x09);

  ,,,,      


  Or just get the number of seconds from rtc for testing
  
  The seconds and hour appear to be binary coded decimal
  Check if the real time clock is set to UTC or local time.

  Gotcha! dothexbyte is modifying the cx register 

  We can use dothexbyte to print the time because the cmos clock
  is often in BCD format. 
  
  * a very terse way to print the time, but harder to read
  -------------------
     mov cx, 3          ; 3 components of the time (hour:minute:seconds)
   .next:
     mov al, cl         ; al loops 321: 0x04 hour, 0x02 minute, 0x00 seconds
     sub al, 1          ; al loops 210
     shl al, 1          ; al loops 420, which is correct for cmos select reg
     out 0x70, al       ; address rtc minute register
     in al, 0x71        ; get data from cmos data reg
     push cx            ; gotcha! .hexbyte modifies cx counter
     push ax
     call dothexbyte.x  ; print bcd hour:minute:seconds
     pop cx             ; restore cx counter
     mov al, ':'
     mov ah, 0x0E
     int 0x10
     loop .next
  ,,,

  * print the current time from the cmos real time clock 
  ------------------
   BITS 16
   [ORG 0]
    jmp 07C0h:start    
    
    hextable db "0123456789ABCDEF"    ; digit translation table

    ; **
    dw 0 
    dothexbyte:
      dw 0       ; link
      db 8, '.hexbyte'
    dothexbyte.x:
      pop bx     ; fn return address
      pop dx     ; the number to print (parameter on stack)
      push bx    ; restore return address
      mov ah, 0x0E ; bios teletype function 
      mov bx, hextable   ; translation table
      mov cx, 2          ; number of digits to print
      .again:
        rol dl, 4      ; rotate left 4 bits (print highest first)
        mov al, dl     ; bits to convert to hex digit
        and al, 0x0F   ; only lower 4 bits relevant
        xlatb          ; replace al with hex digit in translation table
        int 10H        ; invoke bios print function
        loop .again
      ret
      ; *

   ; **
   time.doc:
     db 'Displays the time from the cmos real time clock', 13, 10
     db 'eg: time /displays current time', 13, 10
     db 'See also: clock', 13, 10
     dw $-time.doc
   time:
     dw dothexbyte      ; link
     db 4, 'time'
   time.x:

     mov al, 0x0A       ; check if an update in progress ? necessary?
     out 0x70, al       ; address reg
     in al, 0x71        ; get data from cmos data reg
     test al, 0x80      ; is high bit set?
  
     mov al, 0x04       ; select Hour
     out 0x70, al       ; address rtc minute register
     in al, 0x71        ; get data from cmos data reg
     push ax
     call dothexbyte.x  ; print bcd hour 
     mov al, ':'
     mov ah, 0x0E
     int 0x10

     mov al, 0x02       ; select minutes
     out 0x70, al       ; address rtc minute register
     in al, 0x71        ; get data from cmos data reg
     push ax
     call dothexbyte.x  ; print minutes
     mov al, ':'
     mov ah, 0x0E
     int 0x10

     mov al, 0x00       ; select seconds
     out 0x70, al       ; address reg
     in al, 0x71        ; get data from data reg
     xor ah, ah         ; set ah := 0
     push ax            ; put seconds on stack
     call dothexbyte.x  ; print number of seconds

     ret
     ; *

  start:

     mov ax, cs
     mov ds, ax
     mov es, ax
     push 0x140F       ; show clock at row 20, column 15
     call time.x

    jmp $                   ; loop forever or hlt ?
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

  
  * get real time clock info from the cmos chip, working code
  ------------------
   BITS 16
   [ORG 0]
    jmp 07C0h:start    
    
    selectRegister equ 0x70   ;  cmos rtc select register 
    dataRegister equ 0x71     ;  cmos rtc data register 

    hextable db "0123456789ABCDEF"    ; digit translation table
    dothexbyte.doc:
       db 'displays a 1 byte number in hex format', 13, 10
       db 'eg: 255 .hexbyte  /displays FF', 13, 10
       db 'Stack: (n --> )', 13, 10
       db 'See also .hex .s . .byte ...', 13, 10
       dw $-dothexbyte.doc
    dothexbyte:
      dw 0
      db 8, '.hexbyte'
    dothexbyte.x:
      pop bx     ; fn return address
      pop dx     ; the number to print (parameter on stack)
      push bx    ; restore return address
      mov ah, 0x0E ; bios teletype function 
      mov bx, hextable   ; translation table
      mov cx, 2          ; number of digits to print
      .again:
        rol dl, 4      ; rotate left 4 bits (print highest first)
        mov al, dl     ; bits to convert to hex digit
        and al, 0x0F   ; only lower 4 bits relevant
        xlatb          ; replace al with hex digit in translation table
        int 10H        ; invoke bios print function
        loop .again
      ret


   sec db 0        ; saved seconds
   position dw 0   ; row, column where clock will show 
                   ; ie high byte= row, low byte = column

   clock.doc:
     db 'Displays updating time and date from the cmos real time clock', 13, 10
     db 'at x,y row column position ', 13, 10
     db 'eg: 0902 clock /displays clock at row 9, col 2', 13, 10
     dw $-clock.doc
   clock:
     dw 0
     db 5, 'clock'
   clock.x:

     pop dx               ; juggle return fn
     pop word [position]  ; parameter where clock will be shown
     push dx

   .updating:
     mov al, 0x0A       ; check if an update in progress 
     out 0x70, al       ; address reg
     in al, 0x71        ; get data from cmos data reg
     test al, 0x80      ; is high bit set?

    ; push ax
    ; call dothex.x
    ; jne .updating    ; makes an infinite loop in qemu

    mov ah, 02h  ; x86 bios: set cursor position specified in dx
    mov dx, [position] 
    int 10h      ; bios interrupt 

    mov al, 0x00       ; select seconds
    out 0x70, al       ; address reg
    in al, 0x71        ; get data from data reg
    cmp al, [sec]      ; only print if seconds have changed
    je .updating
    mov [sec], al      ; save current seconds

    mov al, 0x04       ; select Hour
    out 0x70, al       ; address rtc minute register
    in al, 0x71        ; get data from cmos data reg
    ; print hour in al in hex format
    push ax
    call dothexbyte.x
    mov al, ':'
    mov ah, 0x0E
    int 0x10

    mov al, 0x02       ; select minutes
    out 0x70, al       ; address rtc minute register
    in al, 0x71        ; get data from cmos data reg
    push ax
    call dothexbyte.x  ; print minutes
    mov al, ':'
    mov ah, 0x0E
    int 0x10

    mov al, [sec]      ; get saved seconds
    xor ah, ah         ; set ah := 0
    push ax            ; put seconds on stack
    call dothexbyte.x  ; print number of seconds

    mov al, ' '        ; print a space between time and date
    mov ah, 0x0E
    int 0x10

    mov al, 0x07        ; Day of month
    out 0x70, al        ; cmos select reg
    in al, 0x71         ; cmos data reg  
    push ax
    call dothexbyte.x   ; print day of month
    mov al, '/'
    mov ah, 0x0E
    int 0x10

    mov al, 0x08         ; Month
    out 0x70, al        ; cmos select reg
    in al, 0x71         ; cmos data reg  
    push ax
    call dothexbyte.x   ; print month 
    mov al, '/'
    mov ah, 0x0E
    int 0x10

    mov al, 0x09        ; Year
    out 0x70, al        ; cmos select reg
    in al, 0x71         ; cmos data reg  
    push ax
    call dothexbyte.x   ; print year in hex
    mov al, ' '
    mov ah, 0x0E
    int 0x10

    mov ah, 0x01     ; x86 bios check if keypress available
    int 0x16      
    jz .updating     ; loop if no keypress
  .exit:
    ret

  start:

    mov ax, cs
    mov ds, ax
    mov es, ax
    push 0x140F       ; show clock at row 20, column 15
    call clock.x

    jmp $                   ; loop forever or hlt ?
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

SHUTDOWN COMPUTER APM

  http://wiki.osdev.org/Shutdown

  http://stackoverflow.com/questions/21463908/x86-instructions-to-power-off-computer-in-real-mode
  http://stackoverflow.com/questions/678458/shutdown-the-computer-using-assembly
  http://stackoverflow.com/questions/3145569/how-to-power-down-the-computer-from-a-freestanding-environment

  https://github.com/cirosantilli/x86-bare-metal-examples/blob/master/apm_shutdown.S
     good gas example of shutting down

TIME AND TIMERS AND PIT

  https://github.com/cirosantilli/x86-bare-metal-examples/blob/9a24f92f36a45abb3f8c37aafc0c3ee9b15563ab/in_pit.S
     complete example of pit usage in assembler

  Pit is the programmable interrupt timer

  INT 08H is a timer interrupt generated I think every 42milliseconds
  but see below for a easier way to time code

  INT 1Ah / AH = 00h - get system time.
  return:
  CX:DX = number of clock ticks since midnight.

  You can use interrupt 1Ah / function 00h (GET SYSTEM TIME) to get the number of clock ticks (1/18.2 s) since midnight in CX:DX.

  If you don’t actually need to use the timer tick interrupt directly,
  there is a much easier alternative. You could code the program to
  poll the count of the timer ticks since midnight that the BIOS
  maintains in the DWORD at offset address 6Ch in the BIOS data area,
  located at segment address 40h. The polling loop could compare the
  count to the value saved on the previous loop, and if the count had
  changed, indicating that a timer tick had occurred, save the count
  (for use in the next loop), call Interrupt 1Ah, etc, and continue
  looping.

CLOCK TICKS ....

  INT 1Ah / AH = 00h - get system time.
  return:
  CX:DX = number of clock ticks since midnight.

  AL = midnight counter, advanced each time midnight passes.
  notes:
  there are approximately 18.20648 clock ticks per second,
  and 1800B0h per 24 hours. 
  AL is not set by the emulator.  Back to Top  


  There is some problem with the code below...
  But it would be handy for getting random numbers

  * wait for 1 second 
  ---------------
   [org 0]
   jmp 07C0h:start

   ticks.doc:
     db 'Clock ticks since midnight (18.2 ticks per seconds)'
     dw $-ticks.doc
   ticks:
      dw 0            ; link to previous
      db 5, 'ticks'
   ticks.x:
      mov  ah, 00h
      int  1Ah
      ; result in cx:dx
      ret 

   start:
     mov ax, cs
     mov ds, ax
     mov es, ax
     ;mov cx, 10

   .next:
     ;push cx       ; save counter 
     ;call ticks.x
     mov  ah, 00h
     int  1Ah      ; this interrupt is causing some prob, no stack???
     mov ax, dx
     mov bl, 10
     div bl        ; al:=quotient, ah:=remainder
     mov al, ah    ; get ready to print last digit of ticks
     add al, '0'   ; convert to digit
     ;mov al, 'Z'   ; convert to digit
     mov ah, 0eH   ; bios print char in al
     int 10H       ; print it

     ;pop cx        ; restore counter
     ;loop .next

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

TIMING CODE ....


  * time some code 
  -------------
   [ORG 0]
   jmp 07C0h:start        ; start label in segment 07C0

  tick.doc:
    db ' Pushes number of clock ticks on stack'
    dw $-tick.doc
  tick:
    dw 0           ; link to previous dict word or null
    db 4, 'tick'   ; forth counted name
  tick.x:
    mov ah, 00h  ; interrupts to get system time        
    int 1Ah      ; cx:dx now holds number of clock ticks since midnight      
    pop bx     ; balance return pointer
    push dx    ; push result clock ticks on stack
    push bx    ; restore return pointer

  .exit:
    ret

  start:
    mov ax, cs     ; the code segment is already correct (?!)
    mov ds, ax     ; set up data and extended segments
    mov es, ax     ; print with stosw

    call tick.x
    mov cx, -1     ; loop lots of times and time it
  .again:
    push cx
    pop cx
    loop .again

    call tick.x
    pop ax         ; last timer
    pop bx         ; first timer
    sub ax, bx     ; last - first is time taken to do code
    mov ah, 0x0e ; print char func
    add al, '0'  ; convert to asci digit
    int 10h

    jmp $          ; loop forever
    times 510-($-$$) db 0   
    dw 0xAA55              
  ,,,
 
WAITING AND DOING NOTHING FOR SOME TIME ....

  * wait one second (1000000 microseconds)
  ----------
   mov     cx, 0fh
   mov     dx, 4240h
   mov     ah, 86h
   int     15h
 ,,,,

  * wait for 1 second 
  ---------------
   [org 0]
   jmp 07C0h:start

   sleep.doc:
     db 'Just waits for some time'
     dw $-sleep.doc
   sleep:
      dw 0            ; link to previous
      db 4, 'wait'
   sleep.x:
      mov  cx, 0fh     ; cx:dx microseconds to wait
      mov  dx, 4240h
      mov  ah, 86h
      int  15h
      ret 

   start:

     mov ax, cs
     mov ds, ax
     mov es, ax

     mov cx, 10
   .next:
     mov al, 'Z'   ; print something
     mov ah, 0eH   ; teletype AL bios function
     int 10H
     call sleep.x
     loop .next

    jmp $                   ; halt here
    times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
    dw 0xAA55               ; The standard PC boot signature
  ,,,

BIOS

  In real mode, the bios provides all sorts of useful functions
  for reading and writing and displaying
  
 * get information about the current bios
 ----------------------
  mov ah, C0h
  int 15h
  ;this returns a table of information
  ,,,, GOTCHAS

  * watch out, dividend is only 1 byte! so AH? is undefined!
  ------------
    mov AX, [dividend]
    jmp $
    dividend db 54
  ,,,

FASM
  
  Fasm is the 'free assembler' and appears to be less actively
  maintained than 'nasm'

ASSEMBLY AND NASM 

  Nasm is the 'netwide assembler' and appears actively maintained.
  Assembly language programming has the reputation as a egregious
  wrongheaded persuit, the domain of casino card counters and their
  ilk. But its really not that bad.

LABELS ....

  labels may be local (starting with a dot) or non local.
  local ones for some reason need a non local one before them.

  * error, nasm doesnt like this
  -----
    .again
    jmp .again
  ,,,,

  * ok nasm is happy 
  -----
   start:
    .again
    jmp .again
  ,,,,

ASSEMBLING AND ORGANISING WITH NASM ....

   The program below seems to work even without initializing
   the stack, which is odd, since a procedure needs to use it.

   * a program which includes a proceedure in a separate file
   --------
    jmp start
    start:
     mov ax, 07C0h    ; Set up 4K stack space after this bootloader
     add ax, 288      ; (4096 + 512) / 16 bytes per paragraph
     mov ss, ax
     mov sp, 4096
     mov ax, 07C0h    ; Set data segment to where we're loaded
     mov ds, ax

     mov al, 0xEE 
     mov bl, 2 
     call printi8
    hang: jmp hang

    %include 'printi8.asm'
    times 510-($-$$) db 0   ; Fill the file with 0's
    dw 0AA55h               ; End the file with AA55
  ,,,

VARIABLES ....

  Variables in assembly language a just buffers of initialised
  or reserved but uninitialised data. No more no less. The rest
  is up to you

  * move the value in the DL register into memory 
  ------------
   jmp start
   drive db 0 
   start:
      mov ax, 07C0h    ; first set data-segment=code-segment
      mov ds, ax       ; so that [drive] points to where it should
      mov [drive], dl
      mov ah, 0eh
      mov al, [drive]  ; prints the value as an ascii character
      int 10h          ; not useful but better than nothing
   hang: jmp hang

  ,,,

  * print the first two characters of a string
  ------------
   jmp start
   message db 'hello!'
   start:
      mov ah, 0eh
      mov al, [message]   
      int 10h
      mov al, [message+1]
      int 10h
   hang: jmp hang

  ,,,

VIM AND ASM

  Vim can be used to compile and run bootable assembly code
  with qemu. Use control alt F to exit full screen in qemu
  Use left control alt to get the mouse back from qemu

  * a command to write the next assembly proceedure to its own file 
  >> command! -nargs=1 Asp /^ *[a-z0-9]\+:/,/^ *ret *$/w <args>.asm

  * map the key sequence ';as' to compile the whole file with nasm 
  >> map ;as :!nasm -f bin % -o %:r.bin<cr>

  In the examples below, the complete assembly program is 
  supposed to be within 2 'markers' within a document. The markers
  are '---' on a line by itself and ',,,' on a line by itself. These
  2 markers mark the beginning and end of the assembly program
  within the document.

  * just compile an assembly program within a document 
  >> map ;cc :?^ *---?+1,/,,,/-1w ! ( cat - ) > test.asm; nasm -fbin -o test.bin test.asm; 
 
  * compile and run a fragment of boot assembly with nasm and qemu, fullscreen
  >> map ;aa :?^ *---?+1,/,,,/-1w ! ( cat - ) > test.asm; nasm -fbin -o test.bin test.asm; mkdosfs -C test.flp 1440; dd status=noxfer conv=notrunc if=test.bin of=test.flp; qemu-system-i386 -full-screen -noframe -fda test.flp
 
  * compile and run a fragment inserting bootload code
  >> map ,B :?^ *---?+1,/,,,/-1w ! ( cat - ) \| sed '/\[bootload\]/r bootload.asm' > test.asm; nasm -fbin -o test.bin test.asm; mkdosfs -C test.flp 1440; dd status=noxfer conv=notrunc if=test.bin of=test.flp; qemu-system-i386 -noframe -fda test.flp
 
    ; eg: sed '/[bootload]/r bootload.txt' 
  * compile and run whole file with qemu   
  >> map ,f :!nasm -fbin -o %:r.bin %; sudo mkdosfs -C test.flp 1440; dd status=noxfer conv=notrunc if=%:r.bin of=test.flp; qemu-system-i386 -no-frame -fda test.flp

  * build fragments and compile
  >> map ,B :!./build.pl > all.asm; nasm -fbin -o all.bin all.asm; sudo mkdosfs -C test.flp 1440; dd status=noxfer conv=notrunc if=all.bin of=test.flp; qemu-system-i386 -no-frame -fda test.flp

 * no qemu window decorations, stop with control-c
 >> qemu-system-i386 -no-frame test.flp

 * qemu fullscreen, stop with control-c
 >> qemu-system-i386 -full-screen test.flp

 The command line above may have a problem if the test.flp file
 already exists and is no good since mkdosfs will not overwrite it.

  The following is useful for determining how much space is 
  left within a boot file (which is limited to 512 bytes)

  * see how big a compiled file is without 512 byte padding 
  >> map ;bb :?^ *---?+1,/,,,/-1w ! ( sed -n '/times/\!p' ) > test.asm; nasm -fbin -o test.bin test.asm; ls -la
 
  
DOCUMENT NOTES

  This section contains some meta information about the 
  document.

DANIELS NASM BOOT TIPS xxx

  http://home.swipnet.se/smaffy/asm/info/nasmBoot.txt
  author: Daniel Marjamäki (daniel.marjamaki@home.se)

  The basics
  ----------
  These are the rules that you must follow:
    - The BIOS will load your bootloader at address 07C00h.
      Sadly, the segment and offset varies.
    - Bootstraps must be compiled as plain binary files.
    - The filesize for the plain binary file must be 512
      bytes.
    - The file must end with AA55h.


A minimal bootstrap
-------------------
This bootstrap just hangs:

    ; HANG.ASM
    ; A minimal bootstrap

    hang:                   ; Hang!
            jmp hang

    times 510-($-$$) db 0   ; Fill the file with 0's
    dw 0AA55h               ; End the file with AA55

    note: 
      $ means the current memory offset in
      the assembled machine code.
      $$ the beginning memory offset 

The last instruction puts AA55 at the end of the file. 

To compile the bootstrap, use this command:
    nasm hang.asm -o hang.bin

If you want to test the bootstrap, you must first put it on the first
sector on a floppy disk. You can for example use 'dd' or 'rawrite'.
When the bootstrap is on the floppy, test it by restarting your
computer with the floppy inserted. The computer should hang then.


The memory problem
------------------
  There is a memory problem.
  As I've written bootstraps are always loaded to address
  07C00. We don't know what segment and offset the BIOS has
  put us in. The segment can be anything between 0000 and 
  07C0. This is a problem when we want to use variables.
  The solution is simple. Begin your bootstrap by jumping
  to your bootstrap, but jump to a known segment.

Here is an example:

    ; JUMP.ASM
    ; Make a jump and then hang

    ; Tell the compiler that this is offset 0.
    ; It isn't offset 0, but it will be after the jump.
    [ORG 0]

            jmp 07C0h:start         ; Goto segment 07C0

    start:
            ; Update the segment registers
            mov ax, cs
            mov ds, ax
            mov es, ax

    hang:                           ; Hang!
            jmp hang

    times 510-($-$$) db 0
    dw 0AA55h

If you compile and test this bootstrap, there will be no
visible difference to the minimal bootstrap presented
earlier. The computer will just hang.


Solutions to the exercises
--------------------------

1. 

    ; 1.ASM
    ; Print "====" on the screen and hang

    ; Tell the compiler that this is offset 0.
    ; It isn't offset 0, but it will be after the jump.
    [ORG 0]

            jmp 07C0h:start     ; Goto segment 07C0

    start:
            ; Update the segment registers
            mov ax, cs
            mov ds, ax
            mov es, ax

            mov ah, 9           ; Print "===="
            mov al, '='         ;
            mov bx, 7           ;
            mov cx, 4           ;
            int 10h             ;

    hang:                       ; Hang!
            jmp hang

    times 510-($-$$) db 0
    dw 0AA55h


2. 

    ; 2.ASM
    ; Print "Hello Cyberspace!" on the screen and hang

    ; Tell the compiler that this is offset 0.
    ; It isn't offset 0, but it will be after the jump.
    [ORG 0]

            jmp 07C0h:start     ; Goto segment 07C0

    ; Declare the string that will be printed
    msg     db  'Hello Cyberspace!'


    start:
            ; Update the segment registers
            mov ax, cs
            mov ds, ax
            mov es, ax


            mov si, msg     ; Print msg
    print:
            lodsb           ; AL=memory contents at DS:SI

            cmp al, 0       ; If AL=0 then hang
            je hang

            mov ah, 0Eh     ; Print AL
            mov bx, 7
            int 10h

            jmp print       ; Print next character


    hang:                   ; Hang!
            jmp hang


    times 510-($-$$) db 0
    dw 0AA55h


3.

    ; 3.ASM
    ; Load a program off the disk and jump to it

    ; Tell the compiler that this is offset 0.
    ; It isn't offset 0, but it will be after the jump.
    [ORG 0]

            jmp 07C0h:start     ; Goto segment 07C0

    start:
            ; Update the segment registers
            mov ax, cs
            mov ds, ax
            mov es, ax


    reset:                      ; Reset the floppy drive
            mov ax, 0           ;
            mov dl, 0           ; Drive=0 (=A)
            int 13h             ;
            jc reset            ; ERROR => reset again


    read:
            mov ax, 1000h       ; ES:BX = 1000:0000
            mov es, ax          ;
            mov bx, 0           ;

            mov ah, 2           ; Load disk data to ES:BX
            mov al, 5           ; Load 5 sectors
            mov ch, 0           ; Cylinder=0
            mov cl, 2           ; Sector=2
            mov dh, 0           ; Head=0
            mov dl, 0           ; Drive=0
            int 13h             ; Read!

            jc read             ; ERROR => Try again


            jmp 1000h:0000      ; Jump to the program


    times 510-($-$$) db 0
    dw 0AA55h



This is a small loadable program.

    ; PROG.ASM

            mov ah, 9
            mov al, '='
            mov bx, 7
            mov cx, 10
            int 10h

    hang:
            jmp hang


This program creates a disk image file that contains both
the bootstrap and the small loadable program.

    ; IMAGE.ASM
    ; Disk image

    %include '3.asm'
    %include 'prog.asm'




MIKEOS SIMPLE OS GUIDE

How to write a simple operating system

   This document shows you how to write and build your first operating
   system in x86 assembly language. It explains what you need, the
   fundamentals of the PC boot process and assembly language, and how to
   take it further. The resulting OS will be very small (fitting into a
   bootloader) and have very few features, but it's a starting point for
   you to explore further.

   After you have read the guide, see [6]the MikeOS project for a bigger
   x86 assembly language OS that you can explore to expand your skills.

Requirements

   Prior programming experience is essential. If you've done some coding
   in a high-level language like PHP or Java, that's good, but ideally
   you'll have some knowledge of a lower-level language like C, especially
   on the subject of memory and pointers.

   For this guide we're using Linux. OS development is certainly possible
   on Windows, but it's so much easier on Linux as you can get a complete
   development toolchain in a few mouse-clicks/commands. Linux is also
   really good for making floppy disk and CD-ROM images - you don't need
   to install loads of fiddly programs.

   Installing Linux is very easy thesedays; grab Ubuntu and install it in
   VMware or VirtualBox if you don't want to dual-boot. When you're in
   Ubuntu, get all the tools you need to follow this guide by entering
   this in a terminal window:

sudo apt-get install build-essential qemu nasm

   This gets you the development toolchain (compiler etc.), QEMU PC
   emulator and the NASM assembler, which converts assembly language into
   raw machine code executable files.

PC PRIMER ....

   If you're writing an OS for x86 PCs (the best choice, due to the huge
   amount of documentation available), you'll need to understand the
   basics of how a PC starts up. Fortunately, you don't need to dwell on
   complicated subjects such as graphics drivers and network protocols, as
   you'll be focusing on the essential parts first.

   When a PC is powered-up, it starts executing the BIOS (Basic
   Input/Output System), which is essentially a mini-OS built into the
   system. It performs a few hardware tests (eg memory checks) and
   typically spurts out a graphic (eg Dell logo) or diagnostic text to the
   screen. Then, when it's done, it starts to load your operating system
   from any media it can find. Many PCs jump to the hard drive and start
   executing code they find in the Master Boot Record (MBR), a 512-byte
   section at the start of the hard drive; some try to find executable
   code on a floppy disk (boot sector) or CD-ROM.

   This all depends on the boot order - you can normally specify it in the
   BIOS options screen. The BIOS loads 512 bytes from the chosen media
   into its memory, and begins executing it. This is the bootloader, the
   small program that then loads the main OS kernel or a larger boot
   program (eg GRUB/LILO for Linux systems). This 512 byte bootloader has
   two special numbers at the end to tell the OS that it's a boot sector -
   we'll cover that later.

   Note that PCs have an interesting feature for booting. Historically,
   most PCs had a floppy drive, so the BIOS was configured to boot from
   that device. Today, however, many PCs don't have a floppy drive - only
   a CD-ROM - so a hack was developed to cater for this. When you're
   booting from a CD-ROM, it can emulate a floppy disk; the BIOS reads the
   CD-ROM drive, loads in a chunk of data, and executes it as if it was a
   floppy disk. This is incredibly useful for us OS developers, as we can
   make floppy disk versions of our OS, but still boot it on CD-only
   machines. (Floppy disks are really easy to work with, whereas CD-ROM
   filesystems are much more complicated.)

   So, to recap, the boot process is:
    1. Power on: the PC starts up and begins executing the BIOS code.
    2. The BIOS looks for various media such as a floppy disk or hard
       drive.
    3. The BIOS loads a 512 byte boot sector from the specified media and
       begins executing it.
    4. Those 512 bytes then go on to load the OS itself, or a more complex
       bootloader.

   For MikeOS, we have the 512-byte bootloader, which we write to a floppy
   disk image file (a virtual floppy). We can then inject that floppy
   image into a CD, for PCs that only have CD-ROM drives. Either way, the
   BIOS loads it as if it was on a floppy, and starts executing it. We
   have control of the system!

Assembly language primer

   Most modern operating systems are written in C/C++. That's very useful
   when portability and code-maintainability are crucial, but it adds an
   extra layer of complexity to the proceedings. For your very first OS,
   you're better off sticking with assembly language, as used in MikeOS.
   It's more verbose and non-portable, but you don't have to worry about
   compilers and linkers. Besides, you need a bit of assembly to
   kick-start any OS.

   Assembly language (or colloquially "asm") is a textual way of
   representing the instructions that a CPU executes. For instance, an
   instruction to move some memory in the CPU may be 11001001 01101110 -
   but that's hardly memorable! So assembly provides mnemonics to
   substitute for these instructions, such as mov ax, 30. They correlate
   directly with machine-code CPU instructions, but without the
   meaningless binary numbers.

   Like most programming languages, assembly is a list of instructions
   followed in order. You can jump around between various places and set
   up subroutines/functions, but it's much more minimal than C# and
   friends. You can't just print "Hello world" to the screen - the CPU has
   no concept of what a screen is! Instead, you work with memory,
   manipulating chunks of RAM, performing arithmetic on them and putting
   the results in the right place. Sounds scary? It's a bit alien at
   first, but it's not hard to grasp.

   At the assembly language level, there is no such thing as variables in
   the high-level language sense. What you do have, however, is a set of
   registers, which are on-CPU memory stores. You can put numbers into
   these registers and perform calculations on them. In 16-bit mode, these
   registers can hold numbers between 0 and 65535. Here's a list of the
   fundamental registers on a typical x86 CPU:

   AX, BX, CX, DX General-purpose registers for storing numbers that
   you're using. For instance, you may use AX to store the character that
   has been pressed on the keyboard, while using CX to act as a counter in
   a loop. (Note: these 16-bit registers can be split into 8-bit registers
   such as AH/AL, BH/BL etc.)
   SI, DI Source and destination data index registers. These point to
   places in memory for retrieving and storing data.
   SP The Stack Pointer (explained in a moment).
   IP (sometimes CP) The Instruction/Code Pointer. This contains the
   location in memory of the instruction being executed. When an
   instruction has finished, it is incremented and moves on to the next
   instruction. You can change the contents of this register to move
   around in your code.

   So you can use these registers to store numbers as you work - a bit
   like variables, but they're much more fixed in size and purpose. There
   are a few others, notably segment registers. Due to limitations in old
   PCs, memory was handled in 64K chunks called segments. This is a really
   messy subject, but thankfully you don't have to worry about it - for
   the time being, your OS will be less than a kilobyte anyway! In MikeOS,
   we limit ourselves to a single 64K segment so that we don't have to
   mess around with segment registers.

   The stack is an area of your main RAM used for storing temporary
   information. It's called a stack because numbers are stacked one-on-top
   of another. Imagine a Pringles tube: if you put in a playing card, an
   iPod Shuffle and a beermat, you'll pull them out in the reverse order
   (beermat, then iPod, and finally playing card). It's the same with
   numbers: if you push the numbers 5, 7 and 15 onto the stack, you will
   pop them out as 15 first, then 7, and lastly 5. In assembly, you can
   push registers onto the stack and pop them out later - it's useful when
   you want to store temporarily the value of a register while you use
   that register for something else.

   PC memory can be viewed as a linear line of pigeon-holes ranging from
   byte 0 to whatever you have installed (millions of bytes on modern
   machines). At byte number 53,634,246 in your RAM, for instance, you may
   have your web browser code to view this document. But whereas we humans
   count in powers of 10 (10, 100, 1000 etc. - decimal), computers are
   better off with powers of two (because they're based on binary). So we
   use hexadecimal, which is base 16, as a way of representing numbers.
   See this chart to understand:
   Decimal     0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
   Hexadecimal 0 1 2 3 4 5 6 7 8 9 A  B  C  D  E  F  10 11 12 13 14

   As you can see, whereas our normal decimal system uses 0 - 9,
   hexadecimal uses 0 - F in counting. It's a bit weird at first, but
   you'll get the hang of it. In assembly programming, we identify
   hexadecimal (hex) numbers by tagging a 'h' onto the end - so 0Ah is hex
   for the number 10. (You can also denote hexadecimal in assembly by
   prefixing the number with 0x - for instance, 0x0A.)

   Let's finish off with a few common assembly instructions. These move
   memory around, compare them and perform calculations. They're the
   building blocks of your OS - there are hundreds of instructions, but
   you don't have to memorise them all, because the most important handful
   are used 90% of the time.
   mov Copies memory from one location or register to another. For
   instance, mov ax, 30 places the number 30 into the AX register. Using
   square brackets, you can get the number at the memory location pointed
   to by the register. For instance, if BX contains 80, then mov ax, [bx]
   means "get the number in memory location 80, and put it into AX". You
   can move numbers between registers too: mov bx, cx.
   add / sub Adds a number to a register. add ax, FFh adds FF in
   hexadecimal (255 in our normal decimal) to the AX register. You can use
   sub in the same way: sub dx, 50.
   cmp Compares a register with a number. cmp cx, 12 compares the CX
   register with the number 12. It then updates a special register on the
   CPU called FLAGS - a special register that contains information about
   the last operation. In this case, if the number 12 is bigger than the
   value in CX, it generates a negative result, and notes that negative in
   the FLAGS register. We can use this in the following instructions...
   jmp / jg / jl... Jump to a different part of the code. jmp label jumps
   (GOTOs) to the part of our source code where we have label: written.
   But there's more - you can jump conditionally, based on the CPU flags
   set in the previous command. For instance, if a cmp instruction
   determined that a register held a smaller value than the one with which
   it was compared, you can act on that with jl label (jump if less-than
   to label). Similarly, jge label jumps to 'label' in the code if the
   value in the cmp was greater-than or equal to its compared number.
   int Interrupt the program and jump to a specified place in memory.
   Operating systems set up interrupts which are analogous to subroutines
   in high-level languages. For instance, in MS-DOS, the 21h interrupt
   provides DOS services (eg as opening a file). Typically, you put a
   value in the AX register, then call an interrupt and wait for a result
   (passed back in a register too). When you're writing an OS from
   scratch, you can call the BIOS with int 10h, int 13h, int 14h or int
   16h to perform tasks like printing strings, reading sectors from a
   floppy disk etc.

   Let's look at some of these instructions in a little more detail.
   Consider the following code snippet:
        mov bx, 1000h
        mov ax, [bx]
        cmp ax, 50
        jge label
        ...

label:
        mov ax, 10

   In the first instruction, we move the number 1000h into the BX
   register. Then, in the second instruction, we store in AX whatever is
   in the memory location pointed to by BX. This is what the [bx] means:
   if we just did mov ax, bx it'd simply copy the number 1000h into the AX
   register. But by using square brackets, we're saying: don't just copy
   the contents of BX into AX, but copy the contents of the memory address
   to which BX points. Given that BX contains 1000h, this instruction
   says: find whatever is at memory location 1000h, and put it into AX.

   So, if the byte of memory at location 1000h contains 37, then that
   number 37 will be put into the AX register via our second instruction.
   Next up, we use cmp to compare the number in AX with the number 50 (the
   decimal number 50 - we didn't suffix it with 'h'). The following jge
   instruction acts on the cmp comparison, which has set the FLAGS
   register as described earlier. The jge label says: if the result from
   the previous comparison is greater than or equal, jump to the part of
   the code denoted by label:. So if the number in AX is greater than or
   equal to 50, execution jumps to label:. If not, execution continues at
   the '...' stage.

   One last thing: you can insert data into a program with the db (define
   byte) directive. For instance, this defines a series of bytes with the
   number zero at the end, representing a string:
        mylabel: db 'Message here', 0

   In our assembly code, we know that a string of characters, terminated
   by a zero, can be found at the mylabel: position. We could also set up
   single byte to use somewhat like a variable:
        foo: db 0

   Now foo: points at a single byte in the code, which in the case of
   MikeOS will be writable as the OS is copied completely to RAM. So you
   could have this instruction:
        mov byte al, [foo]

   This moves the byte pointed to by foo into the AL register.

   That's the essentials of x86 PC assembly language, and enough to get
   you started. When writing an OS, though, you'll need to learn much more
   as you progress, so see the [7]Resources section for links to more
   in-depth assembly tutorials.
     __________________________________________________________________

Your first OS

   Now you're ready to write your first operating system kernel! Of
   course, this is going to be extremely bare-bones, just a 512-byte boot
   sector as described earlier, but it's a starting point for you to
   expand further. Paste the following code into a file called myfirst.asm
   and save it into your home directory - this is the source code to your
   first OS.

        BITS 16

   start:
        mov ax, 07C0h    ; Set up 4K stack space after this bootloader
        add ax, 288      ; (4096 + 512) / 16 bytes per paragraph
        mov ss, ax
        mov sp, 4096

        mov ax, 07C0h    ; Set data segment to where we're loaded
        mov ds, ax

        mov si, text_string     ; Put string position into SI
        call print_string       ; Call our string-printing routine

        jmp $                   ; Jump here - infinite loop!


        text_string db 'This is my cool new OS!', 0


   print_string:       ; Routine: output string in SI to screen
      mov ah, 0Eh      ; int 10h 'print char' function

   .repeat:
      lodsb           ; Get character from string
      cmp al, 0
      je .done        ; If char is zero, end of string
      int 10h         ; Otherwise, print it
      jmp .repeat

   .done:
      ret

     times 510-($-$$) db 0   ; Pad remainder of boot sector with 0s
     dw 0xAA55               ; The standard PC boot signature

   Let's step through this. The BITS 16 line isn't an x86 CPU instruction;
   it just tells the NASM assembler that we're working in 16-bit mode.
   NASM can then translate the following instructions into raw x86 binary.
   Then we have the start: label, which isn't strictly needed as execution
   begins right at the start of the file anyway, but it's a good marker.
   From here onwards, note that the semicolon (;) character is used to
   denote non-executable text comments - we can put anything there.

   The following six lines of code aren't really of interest to us - they
   simply set up the segment registers so that the stack pointer (SP)
   knows where our handy stack of temporary data is, and where the data
   segment (DS) is located. As mentioned, segments are a hideously messy
   way of handling memory from the old 16-bit days, but we just set up the
   segment registers and forget about them. (The references to 07C0h are
   the equivalent segment location at which the BIOS loads our code, so we
   start from there.)

   The next part is where the fun happens. The mov si, text_string line
   says: copy the location of the text string below into the SI register.
   Simple enough! Then we use call, which is like a GOSUB in BASIC or a
   function call in C. It means: jump to the specified section of code,
   but prepare to come back here when we're done.

   How does the code know how to do that? Well, when we use a call
   instruction, the CPU increments the position of the IP (Instruction
   Pointer) register and pushes it onto the stack. You may recall from the
   previous explanation of the stack that it's a last-in first-out memory
   storage mechanism. All that business with the stack pointer (SP) and
   stack segment (SS) at the start cleared a space for the stack, so that
   we can drop temporary numbers there without overwriting our code.

   So, the call print_string says: jump to the print_string routine, but
   push the location of the next instruction onto the stack, so we can pop
   it off later and resume execution here. Execution has jumped over to
   print_string: - this routine uses the BIOS to output text to the
   screen. First we put 0Eh into the AH register (the upper byte of AX).
   Then we have a lodsb (load string byte) instruction, which retrieves a
   byte of data from the location pointed to by SI, and stores it in AL
   (the lower byte of AX). Next we use cmp to check if that byte is zero -
   if so, it's the end of the string and we quit printing (jump to the
   .done label).

   If it's not zero, we call int 10h (interrupt our code and go to the
   BIOS), which reads the value in the AH register (0Eh) we set up before.
   Ah, says the BIOS - 0Eh in the AH register means "print the character
   in the AL register to the screen!". So the BIOS prints the first
   character in our string, and returns from the int call. We then jump to
   the .repeat label, which starts the process again - lodsb to load the
   next byte from SI (it increments SI each time), see if it's zero and
   decide what to do. The ret at the end of our string-printing routine
   means: "we've finished here - return back to the place where we were
   called by popping the code location from the stack back into the IP
   register".

   So there we have a demonstration of a loop, in a standalone routine.
   You can see that the text_string label is alongside a stream of
   characters, which we insert into our OS using db. The text is in
   apostrophes so that NASM knows it's not code, and at the end we have a
   zero to tell our print_string routine that we're at the end.

   Let's recap: we start off by setting up the segment registers so that
   our OS knows where the stack pointer and executable code resides. Then
   we point the SI register at a string in our OS binary, and call our
   string-printing routine. This routine scans through the characters
   pointed to by SI and displays them until it finds a zero, at which
   point it returns back into the code that called it. Then the jmp $ line
   says: keep jumping to the same line. (The '$' in NASM denotes the
   current point of code.) This sets up an infinite loop, so that the
   message is displayed and our OS doesn't try to execute the following
   string!

   The final two lines are interesting. For a PC to recognise a valid
   floppy disk boot sector, it has to be exactly 512 bytes in size and end
   with the numbers AAh and 55h (the boot signature). So the first of
   these lines says: pad out our resulting binary file to be 510 bytes in
   size. Then the second line uses dw (define a word - two bytes)
   containing the aforementioned boot signature. Voila: a 512 byte boot
   file with the correct numbers at the end for the BIOS to recognise.

   Let's build our new OS. In a terminal window, in your home directory,
   enter:

nasm -f bin -o myfirst.bin myfirst.asm

   Here we assemble the code from our text file into a raw binary file of
   machine-code instructions. With the -f bin flag, we tell NASM that we
   want a plain binary file (not a complicated Linux executable - we want
   it as plain as possible!). The -o myfirst.bin part tells NASM to
   generate the resulting binary in a file called myfirst.bin.

   Now we need a virtual floppy disk image to which we can write our
   bootloader-sized kernel. Copy mikeos.flp from the disk_images/
   directory of the MikeOS bundle into your home directory, and rename it
   myfirst.flp. Then enter:

dd status=noxfer conv=notrunc if=myfirst.bin of=myfirst.flp

   This uses the 'dd' utility to directly copy our kernel to the first
   sector of the floppy disk image. When it's done, we can boot our new OS
   using the QEMU PC emulator as follows:

qemu -fda myfirst.flp

   And there you are! Your OS will boot up in a virtual PC. If you want to
   use it on a real PC, you can write the floppy disk image to a real
   floppy and boot from it, or generate a CD-ROM ISO image. For the
   latter, make a new directory called cdiso and move the myfirst.flp file
   into it. Then, in your home directory, enter:

mkisofs -o myfirst.iso -b myfirst.flp cdiso/

   This generates a CD-ROM ISO image called myfirst.iso with bootable
   floppy disk emulation, using the virtual floppy disk image from before.
   Now you can burn that ISO to a CD-R and boot your PC from it! (Note
   that you need to burn it as a direct ISO image and not just copy it
   onto a disc.)

   Next you'll want to improve your OS - explore the MikeOS source code to
   get some inspiration. Remember that bootloaders are limited to 512
   bytes, so if you want to do a lot more, you'll need to make your
   bootloader load a separate file from the disk and begin executing it,
   in the same fashion as MikeOS.

Going further

   So, you've now got a very simple bootloader-based operating system
   running. What next? Here are some ideas:
     * Add more routines -- You already have print_string in your kernel.
       You could add routines to get strings, move the cursor etc. Search
       the internet for BIOS calls which you can use to achieve these.

     * Load files -- The bootloader is limited to 512 bytes, so you don't
       have much room. You could make the bootloader load subsequent
       sectors on the disk into RAM, and jump to that point to continue
       execution. Or you could read up on FAT12, the filesystem used on
       floppy drives, and implement that. (See
       source/bootload/bootload.asm in the MikeOS .zip for an
       implementation.)


DOCUMENT-NOTES:
 
  # this section contains information about the document and
  # will not normally be printed.

  # A small (16x16) icon image to identify the book
  document-icon:

  # A larger image to identify or illustrate the title page
  document-image:

  # what sort of document is this
  document-type: book

  # in what kind of state (good or bad) is this document 
  document-quality: just begun
  
  document-history:
  @@ 10 nov 2011
     started this booklet after seeing the 'mikeos' site
     and realising that writing a bootable program in realmode
     x86 code should not be that difficult. Also found a good page
     describing how to load a program from a floppy image and
     execute it (to overcome the 512 byte boot sector limit)
  @@ 17 march 2015
     a little bit more work, trying to amplify the forth section
  @@ july 2016
     The mikeos site seems to be unmaintained now, but I have
     made progress. I started writing simple forth words and 
     have progressed to 'find' which is the most complicated 
     so far. My interest is in total minimalism. In 400bytes
     we can have something useful- eg a hex converter.
  * 28 july 2016 
     Finally got a working extensible forth-like system 
     bootloading correctly, with an absolute minimum of 
     words like find, accept, exec, type, count. Just enough
     to enter and execute commands. No compiling words yet.
  * august 2016
     On going work. Building up forth style functions. Thinking
     about token-threaded (opcode) virtual machine, for portability
     Universal naming for functions, eg code.debug.dump with the 
     prefix put in a table header.
  * oct 2016
     a few animations etc.
  * november 2016
     created a separate bootloader file so as not to clutter up
     examples with bootloading code.
  * 24 november 2016
     extensive forth coding this month. created a simple byte
     code system. worked on the "core" forth system (a non-compiling)
     system. discovered a tricky bug where an extra zero was 
     being put on the stack by the interp: word. hence plus and dump
     were not working. realised that mixing the data stack with the
     machine stack is tricky... on the stack the first and last items
     are return function pointers put there by x86 "call" instructions.
     Found another even trickier bug involving the bstack word (removed)
     causing list: to fail ...

  # who wrote this
  authors: mjbishop
  # a short description of the contents, possible used for doc lists
  short-description:
  # A computer language which is contained in the document, if any
  code-language: forth, asm, nasm, x86
  # the script which will be used to produce html (a webpage)
  make-html: ./book-html.sh
  # the script which will produce 'LaTeX' output (for printing, pdf etc)
  make-latex: ./booktolatex.cgi