#*

  Parse a bash history file which has some explanatory 
  comments above or below the timestamp for commands. This is a format 
  that I use in my bash history file to remind me of something that I did
  a while ago, and also to make it easier to search for the command.
  
  This script will transform the history file into a perl/python/ruby array in
  order to eliminate duplicate and simple commands, while conserving comments
  and timestamps. I find comments above commands in the history file very
  useful for remembering how I did something a long time ago.

  eg in ruby to remove duplicate objects
  >> users = User.find(user_list.map(&:user_id).uniq)
   or just users.uniq

   create a ruby array with
   -----
   class Record
    def initialize(comment, time, command)
     @comment = comment 
     @timestamp = time
     @command = command
    end

    // define equality (for uniq to use)
    def ==(other)
      self.command == other.command && self.comment == other.comment
    end
    def display
      # print out in bash history format
    end
   end

   aa = [
     Record.new("comment", timestamp, command)
     Record.new("comment", timestamp, command)
     Record.new("comment", timestamp, command)
     ...
   ]
   aa.uniq
   ,,,

  Appears to be more or less working, and only takes about 1 second for a 35000
  line history file eg:
  >> pep -f pars/eg/history.pss ~/.bash_history

TESTING

  use the helper functions in helpers.pars.sh to translate to other 
  languages and run.

  * translate to python and run
  ---
    pep -f tr/translate.py.pss eg/bash.history.pss > eg/py/bash.history.pss
    chmod a+x eg/py/bash.history.pss
    cat ~/.bash_history | eg/py/bash.history.pss > test1.txt
    # now compare with the output from the interpreted script
    pep -f eg/bash.history.pss ~/.bash_history > test2.txt
    vimdiff test1.text test2.txt
  ,,,

NOTES

  Sudden thought: there is really no reason to parse all the "records" before 
  printing them. For the sake of speed and memory use, we can just print
  records as we find them.
  
  We can use the "empty startset" technique here, because
  a recordset is just a set of records. The next step is to fork this 
  script to create a ruby array and make the array unique where the 
  comment and command are the same (but not the timestamp). See the 
  skeleton code below. This should keep my .bash_history file quite 
  lean and clean and useful for reference purposes.

HISTORY

  29 june 2022
    Tested with different translators: java, go, python, ruby, c
    tcl. All seem to work, but with a small variation on the number
    of commands eliminated (about +/- 10)

  18 june 2021

    Re-examining this to make more useful. Marking trivial commands and 
    only removing them if they have no attached comment. Also outputting in
    standardized order: comment/timestamp/command. removing all commands
    of 4 letters or less. Working on this makes me want to have a 
    shell command syntax: eg shell; which will execute the workspace as 
    a shell command!! why not? It would make pep a more generally useful
    scripting tool. The workspace would be replaced with the output of the 
    command.

  27 july 2020
     
    testing with pars/tr/translate.java.pss seems to be working

  26 march 2020

    Revising the script. Found a bug for duplicated timestamps, and
    timestamp*comment*timestamp* sequence. Also, ignore trivial commands 

  15 March 2020
    Began this script.

*#
  begin { 
    # the empty recordset trick to simplify the grammar rules
    add "recordset*"; push; 
  }
  read; 
  [\n] { 
    # just to debug
    # lines; print;
    clear; 
  }
  whilenot [\n]; 
  # ignore blank lines
  "",[:space:] { clear; .reparse }
  put;
  B"#".!"#" { 
    [#0123456789] {
      clear; add "timestamp*"; push; .reparse
    }
    clear; add "comment*"; push; .reparse
  }

  # tag the command as trivial if it is 
  # for later removal. If there is a comment above it we may keep it anyway
  
   
  # tag as trivial all commands less than 5 characters
  clip; clip; clip; clip;
  "" { clear; add "trivial*"; push; .reparse }

  clear; get;
  B"df ","df",B"du ",B"mv ",B"cp ",B"less ",B"vim ",B"rm ",B"mkdir ",
  B"find ",B"locate ",B"cd ","cd",B"ls ","ls","pwd","hist","books","bk","ho",
  "updatedb","bashrc","vimrc","os","cos","ccos","make" { 
    clear; add "trivial*"; push; .reparse
  }

  clear; add "command*"; push;

parse>
  # for debugging
  # add "line "; lines; add " char "; chars; add ": "; print; clear; 
  #add "line "; lines; add ": "; print; clear; 
  #unstack; print; stack; add "\n"; print; clear;

  # ----------------
  # 2 tokens
  pop; pop; 

  # ignore duplicated timestamps. 
  "timestamp*timestamp*" {
    clear; ++; get; --; put; clear;
    add "timestamp*"; push; .reparse
  }

  # handle multiline comments
  "comment*comment*" {
    clear; get; add "\n"; ++; get; --; put; clear;
    add "comment*"; push; .reparse
  }

  # dont need because an initial recordset always exists
  #"record*record*","recordset*record*" {
  "recordset*record*" {
    clear; get; add "\n"; ++; get; --; put; clear;
    # debug code
    # a+; count; add " record!\n"; print; clear;
    add "recordset*"; push; .reparse
  }

  # this will be compiled differently from r*r*
  "recordset*command*" {
    clear; get; add "\n"; ++; get; --; put; clear;
    add "recordset*"; push; .reparse
  }

  "recordset*trivial*" {
    a+; # count filtered commands
    clear; add "recordset*"; push; .reparse
  }

  (eof) {
    # clean up trailing comments etc
    "recordset*timestamp*","recordset*comment*" {
      clear; add "recordset*record*"; push; push; .reparse 
    }
  }
  # 3 tokens
  pop;

  # remove trivial commands without comments
  "recordset*timestamp*trivial*" {
    a+; # count filtered commands
    clear; add "recordset*"; push; .reparse
  }

  # ignore duplicated timestamps. 
  "timestamp*comment*timestamp*" {
    clear; ++; get; --; put; clear; ++; ++; get; --; put; --; clear;
    add "comment*timestamp*"; push; push;  .reparse
  }

  # amalgamate comments before and after the timestamp
  "comment*timestamp*comment*" {
    clear; 
    get; ++; ++; add "\n"; get; --; --; put; clear;
    add "comment*timestamp*"; push; push; .reparse
  }

  "comment*timestamp*command*","comment*timestamp*trivial*" {
    clear; get; add "\n"; ++; get; add "\n"; ++; get; --; --; put; clear;
    add "record*"; push; .reparse
  }

  # dont remove trivial commands with comments
  "timestamp*comment*command*","timestamp*comment*trivial*" {
    clear; 
    # switch the order to make comment precede timestamp
    ++; get; add "\n"; --; get; add "\n"; 
    ++; ++; get; --; --; put; clear;
    add "record*"; push; .reparse
  }

  "recordset*timestamp*command*" {
    clear; ++; get; add "\n"; ++; get; --; put; --; clear;
    add "recordset*record*"; push; push; .reparse
  }

  # resolve commands and trivial command with comments
  "recordset*comment*command*","recordset*comment*trivial*" {
    clear; ++; get; add "\n"; ++; get; --; put; --; clear;
    add "recordset*record*"; push; push; .reparse
  }

  push; push; push;

  (eof) {
     pop; pop;
     !"recordset*" {
       push; push; add "# History file did not parse well!\n"; print; clear;
       add "# Parse stack was: "; print; clear; unstack; add "\n"; print;
       quit;
     }
     "recordset*" { 
       clear; get; 
       add "\n# History file parsed and filtered by pars/eg/bash.history.pss \n"; 
       add "# "; count; add " trivial commands (without preceding comments) were removed.\n"; 
       print;
     }
  }