&& Another Plaintext Document Format INTRODUCTION This document is about a type of "markdown" format that I use for my own documentation. This document also serves as a test document for the scripts "mark.latex.pss" and "mark.html.pss" etc. Those 2 scripts use a word-by-word parsing technique with the pep/nom pattern engine to recognise the patterns in the document. It is also important that this document should include "insignificant" patterns in order to test the ability of the mark scripts to reduce tokens properly. Since this is a mark-down format, errors in formatting should not cause the parser to crash or produce no output. That means the parser should recover gracefully in all cases and at least produce plain-text output (if not properly formatted). This format is simply one that I enjoy using, and I dont claim that it has any general merit or that it is in any way "better" than the standard markdown or CommonMark format. IMAGES Images can be inserted into documents starting with a double square bracket [[ then followed by an image filename such as ../img/parsetree.png The image tokens like [[ ]] and ../image/parsetree.png need to be space delimited. An example image: [[ ../img/parsetree.png ]] Images can also have a caption quotation, a width specifier and a page position (float) indicator, in that order. Apart from the image file name the other attributes are all optional, so we can specify just the position indicator for example [[ ../img/parsetree.png <<< ]] The position indicators are currently >>> (float right) <<< (float left) and ccc (center align). Image widths can be measured in % pt cm mm or em * an example image format. >> [[ ../img/parsetree.png """a parsetree""" 30% >>> ]] apple2e.png logo.spirals.png pp.interactive.screenshot.png logo.circles.jpg logo.tricircle.png logo.lang.ibm.png parsetree.png * an image with 50% page width [[ ../img/apple2e.png 50% ]] (old versions) LaTeX doesn't always like dots in file names, but mark.latex.pss should deal with this. * a centered image at 20% width [[ ../img/logo.circles.jpg """A centered image""" 40% ccc ]] * an image floating left at 20% width [[ ../img/logo.spirals.png """Spirals""" 20% <<< ]] * an image floating right at 40% width [[ ../img/logo.lang.ibm.png 40% >>> ]] * an image with width measured in 60pt points [[ ../img/apple2e.png 60pt ]] Whitespace in the image format should be ignored (except within quotes) So the following is a valid image format. ----- [[ ../img/logo.tricircle.png >>> ]] ,,,, See if it works [[ ../img/logo.tricircle.png >>> ]] CAPTIONS FOR IMAGES .... Captions for an image can be provided after the file name with text within """ (3 quotes). Only single line captions are allowed at the moment. An example is [[ ../img/apple2e.png """The Apple 2e logo""" 4cm >>> ]] CODE LINES AND CODE BLOCKS Code blocks are delimited by at least 3 '-' characters starting a line and with at least 3 ',' characters. So ---- and ,,,,, do not delimit a code block because the ---- does not start a line. Code lines are delimited by >> also starting a line. Code lines and code blocks can be preceded by a line starting with an asterisk * . The asterisk line is considered to be the description of the following code. Here are some examples: * some logo code to make a square. >> repeat 4 [ fd 40 rt 90 ] * logo code to make an octogon ----- repeat 8 [ fd 40 rt 45 ] ,,, SECTION HEADINGS Section headings consist of all upper-case lines. Quotes are also allowed in section headings like this >> COMMAND "PUSH" "SUBSECTION" HEADINGS .... Subsections have the same format as section headings but end the line with 4 dots like this .... Currently no sub-sub-section headings are supported, because I dont use them. LINKS AND FILE NAMES File names and folders should be automatically linked or formatted by looking at the file name extension and maybe a leading slash. So /books/pars/index.txt should be marked up as a filename. LISTS This "plain-text" format supports ordered, unordered and definition lists, but not, currently nested lists. Ordered lists start with o/- at the beginning of a line and each item starts with a - dash character. Unordered lists start with u/- and definition lists start with d/- The lists are terminated with a blank line. ORDERED LISTS .... o/- First item in an ordered list - second item - within lists other markup should be rendered like filenames such as file.png and even images * empty items may make a list nest in LaTeX. >> need to investigate. o/- A list item with a code block ----- repeat 4 [ rt 90 ] ,,, - second item containing a code line and description * markup in lists >> continue; break; - In the nomdoc format an emline* token is problematic because it looks forward for the next token, which causes problems in lists * here is trouble. - within lists other markup should be rendered like filenames such as file.png and even images UNORDERED LISTS .... u/- unordered lists start with u/- - each item has a "-" dash and can span multiple lines - the list is terminated with a blank line. - lists cannot be nested at the moment. DEFINITION LISTS .... d/- term: definition - item: each item (term and definition) begins with a dash "-" character, and can span multiple lines. - colon: The ":" character is used to delimit the definition term from the definition. The newline character also can be used to start the definition. Definitions can contain other markup like code definitions >> repeat 8 [ rt 45 fd 20 ] And also filenames like mark.format.txt - no definition: - delimiter: starts with a "d/-" - the definition listends with: a blank line (a line consisting of only whitespace). if a list has nothing in it, it should produce and empty list u/- SPECIAL WORDS It can be enjoyable to markup certain words like LaTeX with special formatting, maybe even including a small icon. Candidates would be Instagram, CommonMark, LaTeX, Pep DATE LISTS Often I write a series of entries under dates in format such as 12 aug 2022 on a line by itself. These can be parsed and translated just like the ordered/unordered/definition lists These need to end with a special token since they can contain blank-lines. Also dates like jan 2000 on a new line should become dates. Month names can be a 3 letter abbreviation like: Jan Feb Mar Apr may Jun jul aug sept oct nov DEC or else the full month name in English like: January February March April may june july august September october november december The recognition of the month name is case insensitive so: JAN FEB marCH APRil MAY JUNE should all be seen as month names. However, invalid dates like 33 jan 2001 should not be parsed as dates. A date list must begin with a blank line and end with a special token like [/dates] or [/date] or maybe just a double blankline. The date must be in the order; day month year so Mar 23 2000 is not considered a valid date, nor is 1999 Aug 20 35 Mar 2000 is also not a valid date, because the day number is out of range. Following is a set of test lists, to ensure that the datelist* token is parsing correctly. * an completely empty datelist 10 aug 2001 [/date] * a list with a single word 10 aug 2001 test [/date] * a list with a single word and blank line 10 aug 2001 test [/date] * a list with 2 empty dates 10 aug 2001 11 aug 2001 [/date] * a date list with blank lines 10 aug 2001 blank lines 11 august 2001 12 aug 2001 [/date] * a list with a code line in it 3 aug 2001 >> include code in list second list. see mark.format.txt for info. Maybe include lists in datelists. But not star lines at the moment. 4 aug 2001 [/date] * a datelist with a star line just before the end 2 aug 2001 * news flash [/date] * datelist with a list in a date * a datelist with star/code block 2 aug 2001 u/- things done - debug - think 3 aug 2001 >> sed s///g [/date] 1 Jan 2010 worked on this system. Blank lines should be allowed within date lists. The date-list has no special start token, just a valid date. The first date in the list needs to be on a line by itself and with a blank line above it. The next date items can have text after them. ----- some code ,,,, 24 August 2022 Thought about a logo language parser and drawing with it in TCL/TK or java. 25 aug 2022 worked on this file mark.format.txt to provide documentation for the mark.latex.pss script. AUG 2022 dates without a day number are not (currently) valid dates. But the date needs to start a line so that: 1 jan 2022 is valid e there is this text after it. 31 DEC 2022 The date list has a special end token "[/dates]" [/dates] GLOSSARIES AND "FAQS" Although I dont use these lists much it could be handy to have them in the format. The translation should be similar to definition lists. see the "palindrome.pss" file or the /tr folder. " multiword quoted text""and" " no end quote """" Open the document "mark.html.pss" or inspect pep.c www.glintbox.org /file.txt [[ /test.html ]]