This is another version of a script to transform a “plain” (minimal markup)
text document into simple html and css. I intend to only add the structures
which I use frequently. This script is grammatically simpler than an earlier
attempt called /eg/mark.html.pss and it feels easier to maintain and extend.
This script has been very successful and is used for rendering the
HTML at
www.nomlang.org
This formatting script relies heavily on /site.blog.css to actually make
the html look good or ok.
GRAMMAR TOKENS USED BY THE SCRIPT
space* white-space and newlines
word* one space delimited word
text* any text including html tags as added
quoted* text between double quotes “like this” but one line only.
url* anything starting with http:// https:// www. etc
file* a filename
upper.heading* an uppercase heading line.
item* an item of a list
endlist* reverse shift reductions of lists
The success of this html formatting script is probably owing to
the small number of grammar tokens. For example there is no newline*
or blankline* token although I do use the accumulator to determine
if the current word is the first word on the line. Also, lists are
terminated by a “blank” line (empty, or only space) but I don't
actually create a parse token for it.
The relative simplicity of this script grammar is probably due
to it using a word-by-word parsing technique which means that
few grammar tokens are required.
TODO
Make archaic spellings like applyed and ribband which I like in
some cases.
all upper case headings which I have used in many documents
Make the text “add command” and “nom scripts” etc format the command
word like 'add' as <code> or something similar.
Think about “quoted”.
or “quoted”,
either ending in dot or comma
to parse properly.
definition lists ?
DONE
put a “title=” attribute in links to give a hint for things like oed:
lookup.
unordered lists. eg
one
two end
with blank line
images, sort of (need to make a width for the <figure> tag)
OTHER FILES
/site.blog.css
actually makes the html look ok.
books/pars/www/blog.sh
A bash script which contains a set of functions which use this
script to manage blog websites.
books/pars/eg/make.html.header.pss
Generates the html header and banner for the page (contains css).
NOTES
U+1F674 🙴 heavy ampersand ornament
U+1F675 🙵 swash ampersand ornament
The development of this proceeded very quickly. In a few hours I
had significant syntax implemented. This was much faster than
mark.html.pss and mark.latex.pss
This is part of an effort to create a ℙ𝕖𝕡 🙵 ℕ𝕠𝕞 based blog with
rss feed (pars/www/blog.sh) as well as the shrob.org blog
and the makethespoon.org site.
It would be nice to have an “until 'ab','cd','ef'” syntax
to that we could parse one line quotes etc. Eg
until '"','\n';
We can't do
whilenot ["\n]
but that has it’s own problems.
MARKUP FORMAT
See /eg/text.tohtml.format.html (.txt) for detailed info about
the minimal markup format that this script recognises and
formats. Supported are images (with size, placement and captions),
lists, headings, links, convenience links etc
STATUS
18 mar 2025
marking up lots of formats, see /eg/text.tohtml.format.html
Script seems to be working well with links headings emphasis.
Unordered lists seem to work.
Still missing some lists and all capital heading lines.
HISTORY
26 april 2025
adding uppercase lines as headings to this formatter.
24 april 2025
making quotes and emphasised text work even when there is a
trailing dot or comma.
18 march 2025
Doing some crazy reverse reductions with lists. There is no real
start-token for an unordered list (just the '-' word starting a
line, but that is the item* token). So have to reduce the list
when I get to the end of it.
6 march 2025
Added schema for rosettacode.org problems.
24 feb 2025
added an schema for word lookups.
reformed the was oed: and urbandict: links are rendered by rendering
in the parsing phase not lexing phase
21 feb 2025
added a url schema for writing about nom syntax
20 feb 2025
really struggled with the images. Had to change the width format
to multiples of 5em, because variable length fields were too
hard to extract without regexs
13 feb 2025
added attributes to images eg <:o:3:<<:imagename.ext> the
parsing is working but have to fix the css and maybe also
the <figure> <img> tag interaction.
11 feb 2025
added forced line breaks
and horizontal rules
9 feb 2025
Added some html curly quotes to quoted text which is not a
link. Added ordinal number superscripts in english (which is
a bit silly really. working on images
4 Feb 2025
Began this script, created some useful syntax
#
read;# newlines and empty lines[\n]{clear;add"\n";put;clear;pop;# check for last all uppercase line heading"upper.heading*"{clear;add"
";get;add"
";put;clear;add"text*";push;}push;# no words on previous line, so this is a blank lineclear;count;"0"{clear;# check here for end of listpop;pop;"item*text*"{push;push;add"\n\n";put;clear;add"endlist*";push;.reparse}push;push;add"
\n";put;clear;}# set accumulator == 0 so that we can count words # per line (and know which is the first word)clear;zero;nochars;add"space*";push;.reparse}# parse space, but maybe [:space:] would be better.[ \t]{while[ \t];clear;add" ";put;clear;add"space*";push;.reparse}# ignore other types of space[:space:]{clear;.restart}# everything else is a word![:space:]{# read word and increment word counterwhilenot[:space:];a+;# here parse image files in format before we# change > < to entities.# --------------#B"<" {# E".jpg>",E".jpeg>",E".png>",E".gif>" {# clip; clop; put; clear;# add "imagefile*"; push; .reparse# }#}# here we build an html tag from minimal and optional markup# attributes are <:corners:float:width:filename.ext># this code is quite tricky. See also /eg/imagetext.tohtml.pss # the .img extension is a fake extension for random images.B"<".E">".!"<>"{E".png>",E".jpg>",E".jpeg>",E".bmp>",E".gif>",E".img>"{# an example image text format may be # <:0:4:>>:/image/name.gif> or # The order of the attributes is important but the attributes # are optional eg: <:<<:r:20pt:name.jpg> wont work because the # float attribute '<<' comes before the rounded corner attribute 'r'clip;clop;put;clear;# put a faint grey border around images.?# add "add";swap;# we use swap to juggle the built html and the original# minimal markup text.# :0: is the circle image (avatar) indicator,# allow the first colon to be missingB":O:",B":o:",B"O:",B"o:"{swap;add"border-radius:50%;";swap;B":"{clop;}clop;}# small rounded corners on the imageB":r:",B"r:"{swap;add"border-radius:5%;";swap;B":"{clop;}clop;}# large rounded corners B":R:",B"R:"{swap;add"border-radius:15%;";swap;B":"{clop;}clop;}# width spec multiple of 5em, allow missing 1st colonB":1:",B":2:",B":3:",B":4:",B":5:",B"1:",B"2:",B"3:",B"4:",B"5:"{B":"{clop;}B"1:"{swap;add"width:5em;";}B"2:"{swap;add"width:10em;";}B"3:"{swap;add"width:15em;";}B"4:"{swap;add"width:20em;";}B"5:"{swap;add"width:25em;";}swap;clop;}# add a default widthswap;!E"em;"{add"width:10em;";}# finish off the style attributeadd"' ";swap;# the float right indicator, it needs to come after :0:B":>>:",B">>:"{swap;add"class='float-right' ";swap;B":"{clop;}clop;clop;}# float left B":<<:",B">>:"{swap;add"class='float-left' ";swap;B":"{clop;}clop;clop;}# centre indicator B":cc:"{swap;add"class='center' ";swap;clop;clop;clop;}B":"{clop;}# build the html image src= attribute. # swap;add" title='";get;add"' src='";get;add"'/>";swap;clear;clear;add"imagefile*";push;.reparse}}# make < and > html entities because they will wreck our page# but not if is >> as 1st word!">>"{replace">"">";replace"<""<";}# some curly quotes, why not? A half hearted attempt for english# promote the language with italics"nom"{clear;add"nom";}"Nom"{clear;add"Nom";}"pep"{clear;add"pep";}"pep/nom","pep and nom"{#clear; add "ℙ𝕖𝕡/ℕ𝕠𝕞";clear;add"ℙ𝕖𝕡 🙵 ℕ𝕠𝕞";# U+1F675 # swash ampersand ornament# U+1F671 🙱 heavy script ligature et ornament# U+1F672 🙲 ligature open et ornament# U+1F673 🙳 heavy ligature open et ornament# U+1F674 🙴 heavy ampersand ornament# U+1F675 🙵 swash ampersand ornament}
#*
I had the idea to substitute archaic words and spellings for
modern one more or less randomly to add a bit of spice to the
text... but not sure if I will
# mark up nom parse tokens as being something special which end in# * . This should not clash with emphasised text because that is parsed# with nom://whilenot E"*".!"*".!B"*".!"#*"{put;clear;add"";get;add"";}# insert some apostrophes"doesnt","isnt","cant","arent","couldnt","didnt","hasnt","havent","shouldnt","mustnt","wasnt"{replace"nt""n't";}"lets","thats","whats"{replace"ts""t's";}"I'm","you're","he's","she's","it's","we're","they're","aren't","can't","couldn't","didn't","doesn't","hadn't","hasn't","haven't","isn't","mightn't","mustn't","oughtn't","shouldn't","wasn't","weren't","won't","wouldn't","I've","you've","he's","she's","it's","we've","they've","I'd","you'd","he'd","she'd","it'd","we'd","they'd","I'll","you'll","he'll","she'll","it'll","we'll","they'll","there's","that's","what's","who's","where's","when's","why's","how's"{# curly vs straight apostrophe# replace "'" "’";replace"'""'";}# some common typos for apostrophe contractions in english"wouldnt","shouldnt","wont","dont","Im","theyre","wasnt","werent","arent","cant","didnt","doesnt","havent","hasnt","isnt","couldnt"{replace"Im""I’m";replace"nt""n’t";}# now do t's english typos or fast typing. I know, very english centric# but I write in english, so there"thats","whats","its"{replace"ts""t’s";}put;# make nice arrows out of -> and --> and <- and <--"->"{clear;add"→";}"-->"{clear;add"↦";}"<-"{clear;add"⇽";}"<--"{clear;add"⇦";}# typeset No. Number abbreviation."No.","no."{clear;add"N;add" text-decoration-thickness:2px;";add" font-style:bold; vertical-align:0.30em;'>o";}# ordinals in english, very perfunctory but sort of fun. # eg: 1st, 2nd, 301rd[0123456789stndrdth]{E"1st"{# check matches [0-9]*1stclip;clip;clip;"",[0-9]{clear;get;clip;clip;add"st";}}E"2nd"{# check matches [0-9]*2ndclip;clip;clip;"",[0-9]{clear;get;clip;clip;add"nd";}}E"3rd"{clip;clip;clip;"",[0-9]{clear;get;clip;clip;add"rd";}}E"th"{# check matches [0-9]*[4-9]th clip;clip;!"".!E"1".!E"2".!E"3".[0-9]{add"th";}}}put;clear;count;# deal with ">>" when not first word!"1"{clear;get;">>"{clear;add">>";put;}clear;count;}# check if this is the first word on the line# because several markup elements (as in markdown) need to be# the 1st word to be significant."1"{clear;get;# a one line comment, just ignored at the moment."#:"{clear;whilenot[\n];clear;.reparse}# uppercase headings. First word must be at least 3 characters[A-Z]{clip;clip;!""{clear;add"upper.heading*";push;.reparse}clear;get;}# unordered list items."-"{clear;add"\n
";put;clear;add"item*";push;.reparse}# asterix as first word on line marks the description of # a code line or block which follows (like a caption)# format this later in 2 token parsing. # starlines are used as captions for code and also citations# for quotations."*"{clear;whilenot[\n];replace">"">";replace"<""<";# primitive markup techniquereplace" nom "" ℕ𝕠𝕞 ";replace" pep "" ℙ𝕖𝕡 ";replace" pep/nom "" ℙ𝕖𝕡/ℕ𝕠𝕞 ";put;clear;add"starline*";push;.reparse}# document or page title "&&"{clear;whilenot[\n];replace">"">";replace"<""<";replace" Pep and Nom"" ℙ𝕖𝕡 🙴 ℕ𝕠𝕞";replace" nom "" Nom ";replace" pep "" Pep ";replace" Nom "" ℕ𝕠𝕞 ";replace" Pep "" ℙ𝕖𝕡 ";# U+1F674 🙴 heavy ampersand ornamentput;clear;add"\n";add"
";get;add"
\n";put;clear;add"text*";push;.reparse}# markdown style headings. I would prefer to use one # as # a comment."#"{clear;whilenot[\n];replace">"">";replace"<""<";put;clear;add"\n";add"
";get;add"
\n";put;clear;add"text*";push;.reparse}# headings to capital case"##"{clear;whilenot[\n];cap;replace">"">";replace"<""<";replace"Pep and Nom""ℙ𝕖𝕡 🙴 ℕ𝕠𝕞";replace" nom "" ℕ𝕠𝕞 ";replace" pep "" ℙ𝕖𝕡 ";replace" pep/nom "" ℙ𝕖𝕡/ℕ𝕠𝕞 ";put;clear;add"\n";add"
";get;add"
\n";put;clear;add"text*";push;.reparse}"###"{clear;whilenot[\n];cap;replace">"">";replace"<""<";replace" nom "" ℕ𝕠𝕞 ";replace" pep "" ℙ𝕖𝕡 ";replace" pep/nom "" ℙ𝕖𝕡/ℕ𝕠𝕞 ";put;clear;add"\n";add"
";get;add"
\n";put;clear;add"text*";push;.reparse}# one line of code etc">>"{clear;whilenot[\n];replace">"">";replace"<""<";put;clear;add"
\n";get;add"\n
\n";put;clear;add"codeline*";push;.reparse}# horizontal rules >-------- (> is already >)B">---"{# ensure matches regex ">[-]{3,}"clop;clop;clop;clop;[-]{clear;add"\n";put;clear;add"text*";push;.reparse}}# codeblocks begin with --- or ---- etcB"---".[-]{clear;until",,,";clip;clip;clip;replace">"">";replace"<""<";put;clear;add"
\n";get;add"
\n";put;clear;while[,];clear;add"codeblock*";push;.reparse}# need a nom codeblock for postprocessing to colourise# the nom code. # nom codeblocks begin with ---+ or ----+ etcB"---".[-+].E"+"{clear;until",,,";clip;clip;clip;# > is used in the parse label so I want # eg/nom.snippet.tohtml.pss to render it# replace ">" ">"; replace"<""<";put;clear;add"
\n";get;add"
\n";put;clear;while[,];clear;add"codeblock*";push;.reparse}# multiline quotes, start and end with 3 quotes """. Starting """ # must be first on line. The only problem is that they can chew up the # whole doc. This may be rendered with a big curly quote at the # beginning. If this is preceded by a star line, then that is the # author of the quotation. The html
will be added later
# during 2 grammar token parsing.B'"""'{clop;clop;clop;until'"""';clip;clip;clip;replace">"">";replace"<""<";put;clear;add"blockquote*";push;.reparse}}clear;get;# force a line break with '>>' (but not first word on line), # could be a way to imitate lists">>"{# todo bug? need to clear; I dont know why this worksclear;add" \n";put;clear;add"text*";push;.reparse}# just some chess stuff, why not"[chess:king]","[chess:queen]","[chess:rook]"{clip;clop;replace"chess:""";replace"king""♚";replace"queen""♛";replace"rook""♜";put;clear;add"";get;add"";put;clear;add"text*";push;.reparse}# insert links to the good nom translation scripts "[nom:translation.links]"{clear;add"
dart |
go |
java |
javascript |
ruby |
python |
tcl |
c
";put;clear;add"text*";push;.reparse}# todo: some "etymological word": for example # [heuristic] -- insert the greek derivation and the explanation# eg: from 'today' that which we learn each day# [idiot] from greek 'private' # these words would be good as "footnotes" maybe "[cryptic]","[heuristic]","[idiot]","[epistemology]"{clip;clop;put;clear;add";add" title='##'>";get;add"";swap;"cryptic"{swap;replace"##""hidden: from latin 'cryptus'";}"heuristic"{swap;replace"##"" εὑρίσκω: what is found.";}"epistemology"{swap;replace"##"" ἐπιστήμη: theory of knowledge ";}"idiot"{swap;replace"##"" ἰδιώτης: private citizen";}put;clear;add"text*";push;.reparse}# sort of tech terms that aren't acronyms...# do ?"[stdin]","[stdout]","[malloc]","[realloc]"{clip;clop;put;clear;add"\n";add";add" title='##'>";get;add"";swap;# put < > around these terms "stdin"{swap;replace"##""standard input stream";}"stdout"{swap;replace"##""standard output stream";}"malloc"{swap;replace"##""c memory allocation torture";}"realloc"{swap;replace"##""more c memory torture";}put;clear;add"text*";push;.reparse}# some explanatory "titles" (tooltips) for acronyms ?# a less verbose way, also do ?"[html]","[csv]","[ast]","[bnf]","[ebnf]","[xbnf]","[pep]","[nom]","[json]","[html]","[xml]","[http]","[gnu]","[rfc]","[faq]","[man]","[awk]","[sed]","[grep]","[groff]","[eqn]","[lisp]","[latex]","[forth]","[unix]","[linux]","[minix]","[vim]","[java]","[c++]","[python]","[perl]","[awk]","[lua]","[wren]","[logo]","[go]","[dart]","[rust]","[tcl]","[antlr]","[gcc]","[tcc]","[utf8]","[utf16]","[unicode]","[eof]","[bash]","[markdown]"{clip;clop;upper;put;clear;add"\n";add";add" title='##'>";get;add"";replace"NOM""ℕ𝕠𝕞";replace"PEP""ℙ𝕖𝕡";# a silly attempt at the LATEX logoreplace"LATEX""LATEX";swap;"HTML"{swap;replace"##""Hyper-text Markup Language";}"CSV"{swap;replace"##""Comma Separate Values";}"PEP"{swap;replace"##""Parsing Engine for Patterns";}"AST"{swap;replace"##""Abstract Syntax Tree";}"BNF"{swap;replace"##""Backus-Naur Form";}"EBNF"{swap;replace"##""Extended Backus-Naur Form";}"XBNF"{swap;replace"##""Any random BNF format";}"NOM"{swap;replace"##""Nom Parsing Language";}"JSON"{swap;replace"##""Javascript object notation";}"HTML"{swap;replace"##""Hyper-text Markup Language, aka tag soup";}"XML"{swap;replace"##""Extensible Markup Language";}"HTTP"{swap;replace"##""Hyper-text Transport Protocol";}"GNU"{swap;replace"##""Gnu is not Unix, silly acronym.";}"RFC"{swap;replace"##""Request For Comments";}"FAQ"{swap;replace"##""Frequently Asked Questions";}"MAN"{swap;replace"##""Unix Manual (Doc) Pages";}"AWK"{swap;replace"##""AWK Programming language";}"SED"{swap;replace"##""Text Stream Editor";}"GREP"{swap;replace"##""Search Text Files: g/regex/p";}"GROFF"{swap;replace"##""Old unix typesetting system";}"EQN"{swap;replace"##""Old unix formula typesetting system";}"LISP"{swap;replace"##""List Processing Language";}"LATEX"{# a silly attempt at the LATEX logoreplace"A""A";replace"E""E";swap;replace"##""The LaTeX text processing system";}"FORTH"{swap;replace"##""The Incomparable Forth 'Language'";}"UNIX"{swap;replace"##""The Unix Operating System";}"LINUX"{swap;replace"##""The successor to minix";}"MINIX"{swap;replace"##""The Minix Minimal Unix Operating System";}"VIM"{swap;replace"##""Vi Improved Text Editor";}"JAVA"{swap;replace"##""Java Programming Language";}"C++"{swap;replace"##""Object-Oriented C";}"LUA"{swap;replace"##""An embeddable script language";}"WREN"{swap;replace"##""R. Nystrom's language";}"PYTHON"{swap;replace"##""A strangely popular indent language";}"PERL"{swap;replace"##""Larry Wall's shell script language";}"AWK"{swap;replace"##""Text-data processing mini-language";}"LOGO"{swap;replace"##""The turtle drawing language";}"GO"{swap;replace"##""Google's C Language Replacement";}"RUST"{swap;replace"##""The Rust System Language (c-ish)";}"DART"{swap;replace"##""Google's Application Language";}"TCL"{swap;replace"##""Tool Control Language";}"ANTLR"{swap;replace"##""Another Tool for Language Recognition";}"GCC"{swap;replace"##""The Gnu C Compiler";}"TCC"{swap;replace"##""Bellard's Tiny C Compiler";}"UTF8"{swap;replace"##""Unicode Text Format 8";}"UTF16"{swap;replace"##""Unicode Text Format 16";}"UNICODE"{swap;replace"##""The Universal Language Code";}"EOF"{swap;replace"##""End-Of-File (input-stream)";}"BASH"{swap;replace"##""Unix [B]ourne [A]gain [Sh]ell";}"MARKDOWN"{swap;replace"##""non-distracting text documents";}put;clear;add"text*";push;.reparse}# urls, we need to add html formatting later because of the# "text" http://dada.org syntax There are a lot of "fake" schemas # here for convenience.B"rosetta:",B"urbandict:",B"oed:",B"wp:",B"nom:",B"nomsyn:",B"nomsf:",B"pep:",B"http://",B"https://",B"nntp://",B"file://",B"www."{!"rosetta:".!"urbandict:".!"oed:".!"wp:".!"nom:".!"nomsyn:".!"nomsf:".!"pep:".!"http://".!"https://".!"nntp://".!"file://".!"www."{B"file://"{replace"file://""";put;}# make the fake schema wp:// or wp: wikipedia links after wp:// should# just be the wikipedia page name# better to parse this in the E"url*".!"url*" block so that # we can make a nice visible link text for the wikipedia page.# ie. do the same as the nom:// fake urlB"wp:"{clop;clop;clop;B"//"{clop;clop;}# I dont like writing underscoresreplace".""_";put;clear;add"https://en.wikipedia.org/wiki/";get;put;}# schema for oed eg oed:// with search# oxford english dictionaryB"oed:"{# allow trailing dot or commaE".",E","{clip;}!B"oed://"{replace"oed:""oed://";}put;}#add "https://www.oed.com/search/dictionary/?scope=Entries&q=";# schema for the urban dictionary, just because it can be fun,# and anyway, nom is a language thing, and we like language.# this should be parse in quoted*url* etcB"urbandict:"{# allow trailing dot or commaE".",E","{clip;}!B"urbandict://"{replace"urbandict:""urbandict://";}put;}# rosettacode.org problemsB"rosetta:"{# allow trailing dot or commaE".",E","{clip;}!B"rosetta://"{replace"rosetta:""rosetta://";}replace".""_";put;}# add "https://www.urbandictionary.com/define.php?term=";# this is just a convenience so I dont have to type out the url# to the pep/nom sourceforge site everytimeB"nomsf:",B"nomsf://"{E".",E","{clip;}# allow trailing ./,replace"nomsf:""";B"//"{clop;clop;}put;clear;add"https://bumble.sf.net/books/pars/";get;put;}# convenience schema, this time for nom language commands # eg: push pop get putB"nom:"{# allow trailing dot or commaE".",E","{clip;}# add the url later, much easier.!B"nom://"{replace"nom:""nom://";}put;}# another convenience schema, nom syntax documentation# eg: blocks, tests, parselabel B"nomsyn:"{# add the url later, much easier.!B"nomsyn://"{replace"nomsyn:""nomsyn://";}put;}# pep virtual machine structure eg: stack, tape, peep B"pep:"{E".",E","{clip;}# allow trailing ./,!B"pep://"{replace"pep:""pep://";}put;clear;}# add a schema to www. urlsB"www."{clear;add"http://";get;put;}clear;add"url*";push;.reparse}}# a fake uri schema syntax eg google:"pratt parsers"# --> https://www.google.com/search?q=distance+colombia+to+tasmania# this is separate to the code above because it has to read ahead# in the input streamB"google:",B"google://"{replace"google://""";replace"google:""";# read until next " or newlineB'"'{clop;whilenot[\n"];#replace ">" ">"; replace "<" "<";replace" ""+";put;clear;add"https://www.google.com/search?q=";get;put;clear;!(eof){read;[\n]{zero;nochars;}}clear;add"url*";push;.reparse}}# local files with no schema, imagefile tokens have already been parsedE".h",E".c",E".a",E".txt",E".doc",E".py",E".rb",E".rs",E".java",E".class",E".xml",E".json",E".tcl",E".tk",E".sw",E".js",E".go",E".pp",E".pss",E".cpp",E".pl",E".html",E".pdf",E".tex",E".sh",E".css",E".out",E".log",E".png",E".jpg",E".jpeg",E".bmp",E".mp3",E".wav",E".aux",E".tar",E".gz",E"/"{# not very elegant all this. maybe an ee test would be good # (begins with but not equal to) or change the delim to . and push!".h",!".c",!".a",!".txt",!".doc",!".py",!".rb",!".rs",!".java",!".class",!".xml",!".json",!".tcl",!".tk",!".sw",!".js",!".go",!".pp",!".pss",!".cpp",!".pl",!".html",!".pdf",!".tex",!".sh",!".css",!".out",!".log",!".png",!".jpg",!".jpeg",!".bmp",!".mp3",!".wav",!".aux",!".tar",!".gz",!"/"{# just automatically link filename beginning with /# because they are probably weblinksB"/"{clear;add"url*";push;.reparse}!B"http://".!B"https://".!B"nntp://".!B"file://".!B"www."{clear;add"file*";push;.reparse}}}# multiple words quoted text,eg: "sych mor", maximum one line B'"'.!'"'.!'""'.!'"""'.!E'"'{# single quoted word with trailing . or , etc. This can't be part# of a URL link E'",',E'".',E'"!',E'":'{# leave the trailing ,.:! etc and use replaceclop;replace">"">";replace"<""<";replace'",'"”,";replace'".'"”.";replace'"!'"”!";replace'":'"”:";put;clear;add"“";get;add"\n";put;clear;add"text*";push;.reparse}clop;whilenot[\n"];replace">"">";replace"<""<";put;clear;# The code below is not great, but is required because we# dont have "until 'ab','cd','ef'" syntax. ie multiple end delimiters# all this is to prevent multiline quotes (which could eat up the # whole document.!(eof){read;[\n]{zero;nochars;}}clear;add"quoted*";push;.reparse}# single quoted word, multiline quotes (blockquotes) may begin with# """. These can be the link text for a hyperlink which is why I do# not format them immediately.B'"'.!'"'.!'""'.!'"""'.E'"'{clip;clop;replace">"">";replace"<""<";put;clear;add"quoted*";push;.reparse}# or B'**'.![*] ? B'**'.!'**'.!'****'.!"***"{# single bold emphasised word eg: **strong**E'**'{clip;clip;clop;clop;replace">"">";replace"<""<";put;clear;add"";get;add"\n";put;clear;add"text*";push;.reparse}# single bold emphasised word with trailing , or .E'**,',E'**.'{clop;clop;replace">"">";replace"<""<";replace"**."".";replace"**,"",";add"\n";put;clear;add"";get;put;clear;add"text*";push;.reparse}}# bold emphasised multi-word text between **double asterixes**# single line maximum, multiple wordsB"**"{clop;clop;whilenot[\n*];# find the next * if its there. This is clumsy code because we# cant say "until '**','\n';" which would be better# actually this code accepts ** text* with only one terminating # asterix, but its not important. It's a text format...replace">"">";replace"<""<";put;clear;add"";get;add"\n";put;clear;# If there is some emphasised text immediately on the next line# this will not be good, but we aren't flying an aeroplane.while[*];clear;add"text*";push;.reparse}# emphasised italic text between *two asterixes*# single line maximum, multiple wordsB"*".!"*".!E"*"{# single emphasised word, with trailing ,. etcE'*,',E'*.'{clop;replace">"">";replace"<""<";replace"*."".";replace"*,"",";add"\n";put;clear;add"";get;put;clear;add"text*";push;.reparse}clop;whilenot[\n*];replace">"">";replace"<""<";put;clear;add"";get;add"";put;clear;# could i just use "while [*];" here?!(eof){read;[\n]{zero;nochars;}}clear;add"text*";push;.reparse}# single emphasised word, no special grammar token needed.B'*'.!'*'.!'**'{E'*'{clip;clop;replace">"">";replace"<""<";put;clear;add"";get;add"";put;clear;add"text*";push;.reparse}}clear;add"word*";push;}!""{clear;# just delete weird characters, we don't need them.# but probably should investigate further
#*
add “!
An unexpected character '"; get; add “'";” add " in text input was encountered at \n";
add " line “; lines; add ” char “; chars; add” \n";
add " Check the 'lexical parsing' phase of the script \n";
add " pars/eg/text.tohtml.pss “;” add " This is the section of the script above the parse> label \n";
print; quit;
#
}parse># for debugging, add % as a latex comment.# add "\n"; print; clear;# -----------------# 2 tokens parse reductionspop;pop;# a list at the end of the document, with no blankline to# terminate it.(eof){"item*text*"{push;push;add"\n\n";put;clear;add"endlist*";push;.reparse}# resolve upper case lines at the end of the fileB"upper.heading*",E"upper.heading*"{replace"upper.heading*""text*";push;push;.reparse}}# starline*codeline* or starline*codeblock* is significant"starline*space*"{# dont really need this spaceclear;get;++;get;--;put;clear;add"starline*";push;.reparse}# This is another use for starline, as the "citation" or author# for a multiline quote ("""..."""). This has to go above here because# starlines are about to disappearE"blockquote*".!"blockquote*"{B"starline*"{clear;add"
\n";++;get;--;add"";get;add"\n";add"
\n";put;clear;add"text*";push;.reparse}# blockquote with no citation, treat the unknown 1st token as # text.clear;get;add"\n
\n";++;get;--;add"
\n";put;clear;add"text*";push;.reparse}# a caption followed by some codeB"starline*".!"starline*"{E"codeline*",E"codeblock*"{clear;add"\n";add"\n";get;add"\n";++;get;--;add"";put;clear;add"text*";push;.reparse}# format star-lines, then reduce to text (token no longer needed)replace"starline*""text*";push;push;# state;--;--;add"\n";get;add"\n\n";put;clear;# dont need to transfer attribute++;++;.reparse}# format and reduce image files, but the tag has already# been built above so we can just add the figure and caption if # requiredE"imagefile*".!"imagefile*"{# "link text" http://abc syntaxB"quoted*"{# the problem here is that needs to set the width
# and alignment not the image, but the tag has already# been built. Maybe we can just accept that captioned images# are not going to look much good.clear;++;get;# check if image is floating left or right# a hack to get around no "contains" test eg C"float-right"replace"float-right""";!(==){clear;add"\n\n ";}(==){clear;add"\n\n ";}get;--;add"\n \n ";get;add"\n ";add"\n\n";put;clear;add"text*";push;.reparse}# image with no caption clear;get;++;get;--;add"\n";put;clear;add"text*";push;.reparse}# format and reduce urlsE"url*"{# "link text" http://abc syntaxB"quoted*"{clear;++;get;--;# deal with nom schema, nom: has been normalised to nom://# nom command filenames in format nom..txtB"oed://"{replace"oed://""";++;put;clear;add";get;--;add"' title='oxford english dictionary'>";get;add"";put;clear;add"text*";push;.reparse}B"urbandict://"{replace"urbandict://""";++;put;clear;add";get;--;add"' title='urban dictionary'>";get;add"";put;clear;add"text*";push;.reparse}# the rosettacode.org problems # eg https://rosettacode.org/wiki/Balanced_bracketsB"rosetta://"{replace"rosetta://""";++;put;clear;add";get;--;add"' title='rosetta-code problem'>";get;add"";put;clear;add"text*";push;.reparse}B"nom://"{replace"nom://""";++;put;clear;#todo put a title='nom stack command' hereadd";get;add"'";add" href='http://nomlang.org/doc/commands/nom.";get;--;add".html'>";get;add"";put;clear;add"text*";push;.reparse}B"nomsyn://"{replace"nomsyn://""";++;put;clear;add";get;add"'";add" href='http://nomlang.org/doc/syntax/nom.syntax.";get;--;add".html'>";get;add"";put;clear;add"text*";push;.reparse}# pep machine filenames in format pep..txtB"pep://"{replace"pep://""";++;put;clear;add";get;add"'";add" href='http://nomlang.org/doc/machine/pep.";get;--;add".html'>";get;add"";put;clear;add"text*";push;.reparse}# other "quote" url:// formatsclear;add";++;get;--;add"'>";get;add"";put;clear;add"text*";push;.reparse}# plain url link, add html link to textclear;++;get;--;# make a nice link to the OEDB"oed://"{replace"oed://""";++;put;--;clear;get;add";++;get;add"' title='oxford english dictionary'>";get;add"";--;put;clear;add"text*";push;.reparse}# what about multiple wordsB"urbandict://"{replace"urbandict://""";++;put;--;clear;get;add";++;get;add"' title='urban dictionary'>";get;add"";--;put;clear;add"text*";push;.reparse}B"rosetta://"{replace"rosetta://""";++;put;--;clear;get;add";++;get;add"' title='rosetta-code problem'>";get;add"";--;put;clear;add"text*";push;.reparse}B"nom://"{# is nom://-- valid? yes and also nom://minusminus # mark this up as because it is.replace"nom://""";++;put;--;clear;get;add"";add";++;get;add"'";add" href='http://nomlang.org/doc/commands/nom.";get;add".html'>";# allow nom://++ and nom://a+ syntax etcreplace"++.html""plusplus.html";replace"--.html""minusminus.html";replace"a+.html""aplus.html";replace"a-.html""aminus.html";get;add"";# allow nom://plusplus syntax etc (make visible link correct)replace"html'>plusplus""html'>++";replace"html'>minusminus""html'>--";replace"html'>aplus""html'>a+";replace"html'>aminus""html'>a-";replace"html'>reparse""html'>.reparse";replace"html'>restart""html'>.restart";--;put;clear;add"text*";push;.reparse}B"nomsyn://"{replace"nomsyn://""";++;put;--;clear;get;add";++;get;add"'";add" href='http://nomlang.org/doc/syntax/nom.syntax.";get;add".html'>";# allow nomsyn://parse> syntax etc, but > has already been# made into > for html.replace"parse>.html""parselabel.html";replace"class.html""classes.html";get;add"";# allow nom://parselabel syntax etc (make visible link correct)replace"html'>parselabel""html'>parse>";--;put;clear;add"text*";push;.reparse}B"pep://"{replace"pep://""";++;put;--;clear;get;add";++;get;add"'";add" href='http://nomlang.org/doc/machine/pep.";get;add".html'>";get;add"";--;put;clear;add"text*";push;.reparse}clear;get;add";++;get;add"'>";swap;# remove the https:// etc from the visible link because# they look ugly in the text.replace"https""";replace"http""";replace"nntp""";replace"://""";swap;get;--;add"";put;clear;add"text*";push;.reparse}# "text" file.txt syntax to be linked"quoted*file*"{clear;add";++;get;--;add"'>";get;add"";put;clear;add"text*";push;.reparse}# reduce file* grammar tokens separately so we can html format themB"file*".!"file*"{replace"file*""text*";push;push;--;--;add"";get;add"";put;++;++;clear;.reparse}# quoted*url* or quoted*file* is significant"quoted*space*"{clear;get;++;get;--;put;clear;add"quoted*";push;.reparse}# reduce "quoted" separately so we can add some html curly quotes# the !"quoted*" clause is supposed to ensure 2 tokens (this should only# really be a problem if the "quoted" is the first word of the document)B"quoted*".!"quoted*"{!E"url*".!E"file*"{clear;# The quoted attribute may have a space (or many?) at the end# so need to put it after the curly quotes# add some html curly quotes and get saved spaceadd"“";get;add"”";# remove the space just before the last quote (which is added# during "space*" reductions.replace" ”""”";# add a space to separate from next word.add" ";++;get;--;put;clear;add"text*";push;.reparse}}# all capital lines are headings or ending with '....'"upper.heading*text*"{# check if following text is also uppercase clear;++;get;--;replace" """;[A-Z]{clear;get;++;get;--;put;clear;add"upper.heading*";push;.reparse}# following text not upper so not a heading.clear;get;++;get;--;put;clear;add"text*";push;.reparse}# tokens to reduce to text# codeline, codeblock, word, text, space, quoted,#B"word*",B"text*",B"space*",B"quoted*",B"codeline*",B"codeblock*" {B"word*",B"text*",B"space*",B"codeline*",B"codeblock*"{# need to conserve quoted at endE"word*",E"text*",E"space*",E"codeline*",E"codeblock*"{# check that there really are 2 tokens (not one)push;!""{pop;clear;get;++;get;--;# just playing with auto formatting of certain word# sequences.E"read command"{replace"read command""read command";replace"add command""add command";}put;clear;add"text*";push;.reparse}pop;}}# -------------------# 3 token token reductionspop;# this is crazy reverse reduction"item*text*endlist*"{clear;get;++;get;++;get;--;--;put;clear;add"endlist*";push;.reparse}# have reduced the whole list (in reverse) so just make text # there could be 1,2, or 3 tokens here. need to get the endlist attribute# and add a start
tag
E"endlist*".!B"item*text*"{# more succinct way to get the last token when there is a variable# number of tokens. A clever trick.push;!""{push;!""{push;}}pop;clear;add"\n";add"
";get;replace"
" "
\n";put;clear;add"text*";push;.reparse}# sometimes we might get text*text*something* (eg in lists)B"text*text*".!"text*text*"{replace"text*text*""text*";push;push;--;--;get;++;get;--;put;clear;# transfer unknown token attribute++;++;get;--;put;clear;# realign tape pointer++;.reparse}push;push;push;(eof){
#*
add “<!-- final stack:” ; print; clear;
unstack; add " -->\n"; print; clear;
stack;
add “<!-- html rendered by Nom script (www.nomlang.org): -->\n” ;
add “<!-- bumble.sf.net/books/pars/eg/text.tohtml.pss -->\n” ;
add “<!-- pep -f eg/text.tohtml.pss file.txt -->\n” ;
add “<!-- see eg/text.tohtml.format.txt for text format -->\n” ;
#
print;clear;# print the rendered htmlpop;clear;get;print;quit;# The html header and footer are made in pars/www/blog.sh}