bumble.sf.net language and parsing

plain text

about the code in this folder **

[source-forge-logo] [webdir http://bumble.sf.net/lang/web/ ] [faq] q: What are the files in this directory for? ..This directory contains code useful for web applications. The main point of the code is to transform text documents into other formats. q: How can I use the code in this directory? You can use the code here to transform a text file in a 'frequently asked questions' format into an html web page. This is possible by downloading the file link://Web.jar (about 60k) and then running the command in a dos box- type: java -cp Web.jar FaqDocument textfile > webpage.html or if you only have the Microsoft Virtual machine you can try, for example: jview /cp Web.jar FaqDocument textfile > webpage.html This will create a webpage in an faq format and leave the text file unchanged. In order to see what sort of format the text file should be in you can look at the source for the current page at link://web-faq.txt If you do not have any Java Virtual Machine (Runtime Engine) then you cannot use this code until you get one. In order to create a glossary style document you can use the GlossaryDocument class. for example- java -cp Web.jar GlossaryDocument gloss-file > webpage.html The gloss-file is a normal text file which contains a list of definitions for a set of phrases or words. The format of the text file is best seen by looking at link:///lang/web/doc-glossary.txt .To see how it will be rendered in html you can look at the file link:///lang/web/doc-glossary.html .These glossary classes are only in a developmental stage and doesnt produce very nice output. q: Can I transform to any format other than html? The classes contain some support for transforming to docbook xml but this is still in a developmental phase. q: Is there a wiki here? No. but maybe a few tools which could be used to create a wiki. There is a component to create faq lists in HTML and there is a component to create directory listings. This means that there is some code to do wiki style transformations of text but there is no interface for entering that text. q: What is a wiki? The wiki was invented by someone called Ward Cunningham http://c2.com/ and is a way to edit web pages without using html. The philosophy of a wiki is that any visitor to a webpage should be allowed to edit it, using a normal html form, although in practice a site normally place restrictions on the way that visitors can edit the site. q: what is the edit directory? this link:///edit/ ectory contains the beginnings of a text editor written in java. The editor is orientated towards saving on a ssh server via sftp. this is mainly because this is the only way to save to the source forge server. q: How do I write an faq document? Look at the file link://web-faq.txt which is the souce for the webpage (assuming that you are reading the page http://bumble.sf.net/lang/web/web-faq.html ) This will show you the format for writing an faq document such as this one. The format is reasonably simple. q: what are the codes in square brackets in the text files? These are a way to provide some structure in an unstructured text document. The codes indicate to the transformation engines the sort of text which it is dealing with. For example this faq document is enclosed in [ faq] [ /faq] tags to indicate to the transformation engine that it is dealing with an FAQ style document. Html also uses tags, but many more. The idea of this transformation engine is to use as few tags as possible and to make them of a semantic nature rather than of a visual or layout nature. For example, the FAQ tags say something about the 'meaning' of the text within the tags rather than saying anything about how the document should be layed out or formatted when it is displayed visually. The idea of this is to remove from the writer the burden of having to decide how the document should look when he or she is attempting to write. The writer can decide how the document should look afterwards. Also, if you look at the link://web-faq.txt 'text file' which these HTML pages were generated from, you will see that the 'source' is quite clean. That is to say there are very few tags in the text files, which makes them easier to read and I believe easier to maintain. This is based on the principle that it is better and more creative to think about one sort of thing at a time. q: Are there any similar systems to this available? There are many wiki systems available, many of them far more powerful than the current system. There are also some systems which emphasize having minimal tags in the source files for example the http://daringfireball.net/projects/markdown/ 'Markdown' system. The markdown system seems to have the same philosophy as the current system. Also there appears to be a http://www.michelf.com/projects/php-markdown/ 'Php Markdown'. These systems are no doubt much more advance than the current one, and you would be well advised to use them if you want to create a web site. The current system is only in its initial stages and may not be continued. It appears that Markdown still allows the writer to put formatting code into the text document. Therefore I feel that my system has a slightly different outlook from Markdown. This system attempts to encourage the writer to think about semantic content rather than visual content, but does not force the writer to categorize his or her writing. http://paginas.fe.up.pt/~villate/parsewiki/ This is a system which can transform to other formats apart from html but uses a normal style wiki syntax. q: Why not use XML for the document format?

Xml


is a strict format which requires or at least encourages the author to make decisions about the categories of information which his or her writing will be dealing with, but often writers do not wish to make these sort of decisions or are not able to make these decisions because their ideas about the nature of what they are writing about are vague and will develope during the course of their writing. For this reason I prefer a non strict format which attempts to make guesses about the semantic content of the text. q: Is it possible to change the format for the faq document? Most wiki systems and text transformations use regular expressions but this system actually parses the text document to find the structures. This is slower and more complex than a regular expression system but it allows more precise control of the way the document is transformed and allows a type of 'query' to be made of the document about its content. For example the FAQ class can determine how many questions there are in the document and how many answers where as a regular expression system would have difficulty in finding that information. This means that the syntax of the text documents is determined by the parsing which the objects do of the document and so to change the document syntax requires changing those parsing routines. In some cases this is simple but in other cases not so simple. q: What other documents are available? There is a brief link:///bumble-faq.html 'faq' in the top level directory which is That file describes the overall bent of this site. There is also an FAQ in the link:///lang directory of this site. q: What does the '[dir]' tag mean? This code instructs the transforming engine to insert a directory listing in the outputted html document. The directory listing is the listing of a directory on the computer where the transformation engine is run, which would usually be the web-server. q: What does the '[image]' tag mean? This tag instruct the transformation engine to insert an image in the rendered document. For example- '[image http://server. href="net/logo.gif]' ">net/logo.gif]' should insert the logo image in the faq document. However the exact location in the document is not really controllable at the moment. q: What does the '[webdir]' tag mean? This tag allows the insertion of a set of links from another web-page in the rendered document. For example- '[webdir http://www.yahoo.com]' would insert all the links from the yahoo page into the rendered document. Please note that this component is only in a development stage. For example the links from the page are not transformed to make them useable from a different server. q: What does the '[gloss]' tag mean? This tag starts a glossary section. q: What does the '[howto]' tag mean? This tag starts a procedure section. A procedure section contains a series of steps which represents a set of instructions of how to do something. This tag and functionality is not properly implemented. q: How can I stop a tag from being transformed by the code? You can enclose the tag in single quote characters as I have done in the examples above. (Actually only the leading quote matters). If there had not been enclosed in quotes they would have been transformed by the code engine. q: Can I put footnotes in a document? In theory this should be possible and is handled by the link://FootnoteSection.java class and other classes, but the functionality is still in a developmental phase q: Is this page dynamically generated? No. which means that the file listing which may occur on this page may not be entirely up-to-date. q: what is in the /edit/ ectory? This directory contains the beginnings of a text editor written in java. The editor is orientated towards saving on a ssh server via sftp. this is mainly because this is the only way to save to the sourceforge server. q: Can I use lists in documents? There is a link://PlainList.java class which will recognize and render lists in Html but I have left it out of the faq document class for reasons of simplicity. There is also a link://PlainListDocument.java class which is a document which can contain some text and a list. But this is not that useful really. The syntax of a list is, for example-