bumble.sf.net language and parsing

plain text

welcome to the jumble project **

[faq] q: What is the Jumble project? Jumble is an attempt to write a text transformation engine, similar to those used by wiki applications to turn 'plain text' documents into rendered visual documents. At this stage, a rendered visual document means an html web page but hopefully other formats will be supported, such as pdf, TeX, man pages, DocBook xml, ... which is an extremely ambitious list. q: How does this system differ from the numerous wiki transformation engines and various text2html scripts that exist in all sorts of colours shapes and sizes? From a technical point of view, this project differs in that an attempt is made to parse the document rather than just use patterns and regular expressions to transform the document. From a users perspective, this system does not use a standard wiki markup language but tries to use the minimal possible semantic markup which will allow the parser code to understand what sort of document and text is being dealt with. This approach means that the source text documents remain, in some sense, clean, in that they do no become weighed down with codes and hashes and funny punctuation marks. I believe this is an advantage in that the authors of the documents are not distracted by all these wiki codes. q: What is the current state of the project? The code contains some useful components for recognizing and rendering various structures in plain text documents, including, FAQ blocks, lists, links, paragraph breaks, etc. On a document level the faq component is at a useful stage. q: How can I use the code? You can use the code here to transform a text file in a 'frequently asked questions' format into an html web page. This is possible by downloading the file link://Web.jar (about 60k) and then running the command in a dos box- type: java -cp Web.jar FaqDocument textfile > webpage.html or if you only have the Microsoft Virtual machine you can try, for example: jview /cp Web.jar FaqDocument textfile > webpage.html This will create a webpage in an faq format and leave the text file unchanged. In order to see what sort of format the text file should be in you can look at the source for the current page at link://web-faq.txt If you do not have any Java Virtual Machine (Runtime Engine) then you cannot use this code until you get one. q: Is there a wiki here? no. but maybe a few tools which could be used to create a wiki. There is a component to create faq lists in HTML and there is a component to create directory listings. This means that there is some code to do wiki style transformations of text but there is no interface for entering that text. q: What is a wiki? The wiki was invented by someone called Ward Cunningham http://c2.com/ and is a way to edit web pages without using html. The philosophy of a wiki is that any visitor to a webpage should be allowed to edit it, using a normal html form, although in practice site normally place restrictions on the way that visitors can edit the site. q: what is the edit directory? this link:///edit/ ectory contains the beginnings of a text editor written in java. The editor is orientated towards saving on a ssh server via sftp. this is mainly because this is the only way to save to the source forge server. q: How do I write an faq document? Look at the file link://web-faq.txt which is the souce for the webpage (assuming that you are reading the page http://bumble.sf.net/lang/web/web-faq.html ) This will show you the format for writing an faq document such as this one. The format is reasonably simple. q: what are these strange codes with square brackets in the text files? these are a way to provide some structure in an unstructured text document. The codes indicate to the transformation engines the sort of text which it is dealing with. For example this faq document is enclosed in [ faq] [ /faq] tags to indicate to the transformation engine that it is dealing with an FAQ style document. Html also uses tags, but many more. The idea of this transformation engine is to use as few tags as possible and to make them of a semantic nature rather than of a visual or layout nature. For example, the FAQ tags say something about the 'meaning' of the text within the tags rather than saying anything about how the document should be layed out or formatted when it is displayed visually. The idea of this is to remove from the writer the burden of having to decide how the document should look when he or she is attempting to write. The writer can decide how the document should look afterwards. Also, if you look at the link://web-faq.txt 'text file' which these HTML pages were generated from, you will see that the 'source' is quite clean. That is to say there are very few tags in the text files, which makes them easier to read and I believe easier to maintain. This is based on the principle that it is better and more creative to think about one sort of thing at a time. q: Are there any similar systems to this available? There are many wiki systems available, many of them far more powerful than the current system. There are also some systems which emphasize having minimal tags in the source files for example the http://daringfireball.net/projects/markdown/ 'Markdown' system. The markdown system seems to have the same philosophy as the current system. Also there appears to be a http://www.michelf.com/projects/php-markdown/ 'Php Markdown'. These systems are no doubt much more advance than the current one, and you would be well advised to use them if you want to create a web site. The current system is only in its initial stages and may not be continued. It appears that Markdown still allows the writer to put formatting code into the text document. Therefore I feel that my system has a slightly different outlook from Markdown. This system attempts to encourage the writer to think about semantic content rather than visual content, but does not force the writer to categorize his or her writing. q: Why not use XML for the document format?

Xml


is a strict format which requires or at least encourages the author to make decisions about the categories of information which his or her writing will be dealing with, but often writers do not wish to make these sort of decisions or are not able to make these decisions because their ideas about the nature of what they are writing about are vague and will develope during the course of their writing. For this reason I prefer a non strict format which attempts to make guesses about the semantic content of the text. q: Is it possible to change the format for the faq document? Most wiki systems and text transformations use regular expressions but this system actually parses the text document to find the structures. This is slower and more complex than a regular expression system but it allows more precise control of the way the document is transformed and allows a type of 'query' to be made of the document about its content. For example the FAQ class can determine how many questions there are in the document and how many answers where as a regular expression system would have difficulty in finding that information. This means that the syntax of the text documents is determined by the parsing which the objects do of the document and so to change the document syntax requires changing those parsing routines. In some cases this is simple but in other cases not so simple. q: What other documents are available? There is a brief link:///bumble-faq.html 'faq' in the top level directory which is That file describes the overall bent of this site. There is also an FAQ in the link:///lang directory of this site. q: What does the code '[dir]' tag mean? This code instructs the transforming engine to insert a directory listing in the outputted html document. The directory listing is the listing of a directory on the computer where the transformation engine is run, which would usually be the web-server. q: What does the code '[image]' tag mean? This tag instruct the transformation engine to insert an image in the rendered document. q: What does the code '[webdir]' tag mean? This tag allows the insertion of a set of links from another web-page in the rendered document. For example- '[webdir http://www.yahoo.com]' would insert all the links from the yahoo page into the rendered document. Please note that this component is only in a development stage. For example the links from the page are not transformed to make them useable from a different server. q: How can I stop a tag from being transformed by the code? You can enclose the tag in single quote characters as I have done in the examples above. (Actually only the leading quote matters). If there had not been enclosed in quotes they would have been transformed by the code engine. q: Is this page dynamically generated? No. which means that the file listing which may occur on this page may not be entirely up-to-date. q: what is in the /edit/ ectory? This directory contains the beginnings of a text editor written in java. The editor is orientated towards saving on a ssh server via sftp. this is mainly because this is the only way to save to the sourceforge server. q: Can I use lists in documents? There is a link://PlainList.java class which will recognize and render lists in Html but I have left it out of the faq document class for reasons of simplicity. There is also a link://PlainListDocument.java class which is a document which can contain some text and a list. But this is not that useful really. The syntax of a list is, for example-