#* This script translates 'sed' (the unix stream editor) scripts into java source code. STATUS Syntax is parsing well, lots of functionality but still missing all branching commands, which may be impossible or too difficult to bother with in java. Also after s/// for nth occurrence substitution. Syntax like "s#a#A#;" for substitutions is not supported yet. The 0~8 gnu sed syntax (every 8th line) is supported . The syntax a\ c\ i\ is now being parsed with a special .restart trick. NOTES Writing a translator for c++ might be more authentic because c++ supports the 'goto' statement, which seems to be needed for the branching commands in (gnu) sed. It is possible to parse a\ etc by issuing a restart after every char and checking for \n without \\\n. This is an interesting parsing technique used here for the 1st time. The workspace is not cleared and .restart is issued to get the text for a\ one character at a time. See the block B"a\\",B"c\\",B"i\\" {...} DONE changed gnu regex to java - eg \1 \2 -> $1 $2 in replacement but only 9 backrefs at the moment also, \(\) group -> () wrote s///w filename; syntax, write to file if sub has occured made a\ i\ and c\ work properly support /rx/I (insensitive address) and /rx/M multiline (But what is the meaning of "M" really?) wrote s///4; only sub 4th occurance or "s///g4;" sub 4th occurance and all occurances after that. TODO Allow a different delimiter for s/// eg "s#a#b#gi" or even s{a}{b}gi etc I could use the same technique that was used for a\ c\ i\ but it would be much easier to use the new until; syntax. make all branching commands work eg : b t T etc, this could be almost impossible in a language that doesnt have goto ...? But c++ does have goto. BUGS Some regex patterns maybe java's not gnu seds. NOTES Adapting this script for an interpreted language would allow sed scripts to be executed directly within the target language. But not for java which has to be compiled. string.matches and Pattern.matches matches the whole input string! So I need to add .* to the front and back of regular expressions. The script uses a similar strategy as tr/translate.java.pss Each machine command is a method, except trivial commands, for which 'in-line' code can be generated. The file 'sed1line.txt' can be used to test this script. This script is recognising a very large subset of gnu sed commands at the moment. Also, it does not parse the regular expressions. Currently there is a difficulty for the pep machine in dealing with the sed syntax 's#a#A#p'. That is, where alternative delimiters are used for substitutions. This could be solved with a new 'until' command that looks to the tapecell for the stop condition (text). Initially, I will only allow standard s/a/A/p syntax. HISTORY 26 July 2022 Added the "s/a/A/3;" syntax for doing only the nth substitution. Also "s/a/A/3g;" to sub all occurences at 3rd and after. 23 july 2022 Added a\ i\ c\ syntax. There may be slight differences with how whitespace is handled in gnu sed with a\ etc. 15 july 2022 Added s///w syntax. 5 july 2022 Added 0~8 step-range syntax. Also, added ranges with single commands which had been forgotten. 4 july 2022 Adding a/c/i commands but not a\ syntax yet. converting to using a run() method for translating the script to java source code. However, this is not so useful for a 'compiled' language such as java. For a script language it would be very useful, because the translated sed script could be executed immediately with piped input. But the sed script would have to be read from file, or a string (not -1) {\n"; add " this.patternSpace.delete(0, this.patternSpace.indexOf(\"\\n\"));\n"; add " this.readNext = false; if (true) continue; \n"; add "} else { this.patternSpace.setLength(0); continue; } /* 'd' */"; } "F" { # F: print input filename + newline # maybe unsupported in java clear; add 'this.output.write("\n"); /* F */'; } "g" { # g: replace patt-space with hold-space clear; add "this.patternSpace.setLength(0); \n"; add "this.patternSpace.append(this.holdSpace); /* 'g' */"; } "G" { # G; append hold-space to patt-space + \\n" clear; add "this.patternSpace.append(\"\\n\" + this.holdSpace); /* 'G' */"; } "h" { # h: replace hold-space with patt-space clear; add "this.holdSpace.setLength(0); \n"; add "this.holdSpace.append(this.patternSpace); /* 'h' */"; } "H" { # H: append patt-space to hold-space + newline clear; add "this.holdSpace.append(\"\\n\" + this.patternSpace); /* 'H' */"; } "l" { # print pattern-space unambiguously, synonym for p ? clear; add "this.output.write(this.patternSpace.toString()+'\\n'); /* 'l' */"; } "n" { # n: print patt-space, get next line into patt-space clear; add "if (this.autoPrint) { \n"; add " this.output.write(this.patternSpace.toString()+'\\n');\n}\n"; add "this.patternSpace.setLength(0);\n"; add "this.readLine(); /* 'n' */"; } "N" { # N: append next line to patt-space + newline clear; add "this.patternSpace.append('\\n'); "; add "this.readLine(); /* 'N' */"; } "p" { clear; add "this.output.write(this.patternSpace.toString()+'\\n'); /* 'p' */"; } "P" { # P: print pattern-space up to 1st newline" clear; add 'if (this.patternSpace.indexOf("\\n") > -1) {\n'; add ' this.output.write(\n'; add ' this.patternSpace.substring(0, \n'; add ' this.patternSpace.indexOf("\\n"))+\'\\n\');\n'; add "} else { this.output.write(this.patternSpace.toString()+'\\n'); }"; } "x" { # x: # swap pattern-space with hold-space clear; add "this.swap(); /* x */"; } "z" { # z: delete pattern-space, NO restart clear; add "this.patternSpace.setLenth(0); /* z */"; } put; clear; add "action*"; push; .reparse } # M and I are modifiers to selectors (multiline and case insensitive) # eg /apple/Ip; or /A/M,/b/I{p;p} "M","I" { "I" { clear; add "(?i)"; put; } "M" { clear; add "(?m)"; put; } clear; add "mod*"; push; .reparse } # patterns - only execute commands if lines match # line numbers are also selectors [0-9] { while [0-9]; put; clear; add "number*"; push; .reparse } # $ is the last line of the file "$" { put; clear; add "number*"; push; .reparse } # patterns - only execute commands if lines match "/" { # save line/char number for error message clear; add "near line/char "; lines; add ":"; chars; put; clear; until "/"; !E"/" { clear; add "Missing '/' to terminate "; get; add "?\n"; print; quit; } clip; # java .matches method matches whole string not substring # so we need to add .* at beginning and end, but not if regex # begins with ^ or ends with $. complicated hey !E"$" { add ".*$"; } !B"^" { put; clear; add "^.*"; get; } put; clear; # add any delimiter for pattern here, or none add '"'; get; add '"'; put; clear; add "pattern*"; push; .reparse } # read transliteration commands "y" { # save line/char number for error message clear; add "near line "; lines; add ", char "; chars; put; clear; # allow spaces between 'y' and '/' although gnu set doesn't until "/"; !E"/",![ /] { clear; add "Missing '/' after 'y' transliterate command\n"; add "Or trailing characters "; get; add "\n"; print; quit; } # save line/char number for error message clear; add "near line "; lines; add ", char "; chars; put; clear; until "/"; !E"/" { clear; add "Missing 2nd '/' after 'y' transliterate command "; get; add "\n"; print; quit; } "/" { clear; add "Sed syntax error? \n"; add " Empty regex after 'y' transliterate command "; get; add "\n"; print; quit; } # replace pattern found clip; put; clear; add 'this.transliterate("'; get; add '", "'; put; clear; # save line/char number for error message add "near line "; lines; add ", char "; chars; ++; put; --; clear; until "/"; !E"/" { clear; add "Missing 3rd '/' after 'y' transliterate command "; get; add "\n"; print; quit; } clip; swap; get; add '"); /* y */ '; # y/// does not have modifiers (unlike s///) put; clear; add "action*"; push; .reparse } # this is an artificial block, created by the code below # which reads multiline append/changes/inserts one char at a time B"a\\",B"c\\",B"i\\" { # print; print; print; E"\\\n" { # turn multiline into java single line with \n # \ means continue text on next line. clip; clip; add "\\n"; .restart } # end of stream means we are finished, so add a dummy # \n (eof) { add "\n"; } E"\n".!E"\\\n" { # finished! the !E"\\\n" above is unnecessary (already checked) # but I will leave for clarity clip; replace "\n" "\\n"; B"a\\" { clop; clop; put; clear; add "this.patternSpace.append('\\n'+\""; get; add '");'; } B"c\\" { clop; clop; put; clear; add "this.patternSpace.setLength(0);\n"; add "this.patternSpace.append(\""; get; add '");'; } B"i\\" { clop; clop; put; clear; add "this.patternSpace.insert(0, \""; get; add '"+\'\\n\');'; } put; clear; add "action*;*"; push; push; .reparse } .restart } # the add/change/insert commands: have 2 forms # a text or a\ "a","c","i" { # ignore intervening space if any put; clear; while [ \t\f]; clear; (eof) { clear; add "Sed syntax error? (near line:char "; lines; add ":"; chars; add ")\n"; add " No argument for '"; get; add "' command.\n"; print; quit; } # also handle the a\ multiline form here # The following are ok: 'a\ text' 'a \ text ' # 'a \ text \ # text' # So a\ can be terminated by eof, or \n without \\ # strategy: read one char, check for \\, if so restart # and write a block "a\\","i\\" etc, and read one char # at a time until ends with \n but not \\\n # if the first not whitespace char is "\" then we need to read # the inputstream until it ends with \n but not \\\n. This # is the a\ i\ c\ syntax This is tricky with pep at the moment. # allowing logic syntax for 'until' would solve this. eg # until "\n".!"\\\n"; # or allow 2 args to until; read; B"\\" { swap; get; #print; print; print; # now should be a\\ or c\\ or i\\ # this will be handled by the block above. .restart } (eof)."\n" { clear; add "[Sed syntax error?] (near line:char "; lines; add ":"; chars; add ")\n"; add " No argument for '"; get; add "' command.\n"; print; quit; } until "\n"; (eof) { E"\n" { clip; } "" { clear; add "{Sed syntax error?] (near line:char "; lines; add ":"; chars; add ")\n"; add " No argument for '"; get; add "' command.\n"; print; quit; } } replace "\n" "\\n"; swap; "a" { clear; add "this.patternSpace.append('\\n'+\""; get; add '");'; } "c" { clear; add "this.patternSpace.setLength(0);\n"; add "this.patternSpace.append(\""; get; add '");'; } "i" { clear; add "this.patternSpace.insert(0, \""; get; add '"+\'\\n\');'; } # should work, because 'this' starts with 't' not a/c/i put; clear; add "action*;*"; push; push; .reparse } # various commands that have an option word parameter # e has two variants # "e" { replace "e" "e; # exec patt-space command and replace"; } "b","e","q","Q","t","T" { # ignore intervening space if any put; clear; while [ ]; clear; # A bit more permissive that gnu-sed which doesn't allow # read to end in ';'. whilenot [ ;}]; # word parameters are optional to these commands # just add a space to separate command from parameter !"" { swap; add " "; swap;} swap; get; # hard to implement because java has no goto ? # or try to use labelled loops?? B"b" { clear; # todo: 'b' branch to