Web Development using Linux

Table of Contents

last revision
27 October 2011, 6:33pm
book quality
just begun


This book seeks to provide recipes for developing web sites using the Linux operating system, with an emphasis on command line tools. This book is not about html and css since that is covered in the html-css-book.txt

Visual Page Creation Wysiwyg ‹↑›

www: kompozer
a graphical tool for creating pages. successor of nvu
www: bluefish
an html editor

Images ‹↑›

www: linux-image-book.txt
more comprehensive information about managing images with tools available for the Linux operating system.

Image Compression ‹↑›

Image Resizing ‹↑›

convert a all '.html' files from iso-8859-1 to utf-8 file

   for x in $(find . -name '*.html')
     do iconv -f ISO-8859-1 -t UTF-8 $x > "$x.utf8"; rm $x;
     mv "$x.utf8" $x; 

Managing Images ‹↑›

Feh is a flexible and fast tool for managing images which will be included in a web-site. See the linux-image-book for more details.

use feh


Svg Scalable Vector Graphics ‹↑›

Video ‹↑›

www: ffmpeg
recode video files into different formats
tools for image compression
webpack -
pngout -

video formats
Digital video - used by camcorders

install kino on linux

 sudo apt-get install kino

Animations ‹↑›

www: synfig
studio a tool to create animations without 'tweening' (that is, drawing every image)

File Transfer ‹↑›

video tools for web-development
stills2dv - creates videos from still images
kino - edit digital video data

Web Site Mirroring ‹↑›

Download all images from a site

 wget -r -l1 --no-parent -nH -nd -P/tmp -A".gif,.jpg" http://example.com/images

Web Site Upload ‹↑›

use rsync or sitecopy

Posting Data ‹↑›

Submit data to a HTML form with POST method and save the response

 curl -sd 'rid=value&submit=SUBMIT' <URL> > out.html

post with a proxy and authentication

 curl -F name='../htdocs/notes/'$1 -F contents='<'$1 -u user:upass -x prox.net:8080 -U bob:proxpass http://serv.net/save.cgi

Html Stuff ‹↑›

Html Links ‹↑›

get the links from a page

 lynx -dump -listonly www.server.net/page.html

find urls within an html file (most of them anyway)

 egrep 'https?://([[:alpha:]]([-[:alnum:]]+[[:alnum:]])*\.)+[[:alpha:]]{2,3}(: \d+)?(/([-\w/_\.]*(\?\S+)?)?)?'

Entities ‹↑›

encode HTML entities

 perl -MHTML::Entities -ne 'print encode_entities($_)' /tmp/subor.txt

or use xmlstarlet to encode entities.

Bash And Web Development ‹↑›

Using the bash shell to develop web-sites maybe quite efficient, if unconventional.

possibly the simplest way to create a web-page from text

 cat file.txt | (echo '<html><body><pre>'; cat -; echo '</pre></body></html>')

a simpler way

 echo '<html><body><pre>'; cat file.txt; echo '</pre></body></html>')

Templating With Bash ‹↑›

a simple template technique with bash

 export a=b; echo -e 'one\ntwo\nand <x>' | (echo 'cat << EE';sed 's/<x>/$a/g'; echo 'EE') | bash

use the technique above to substitute the date into the template

 cat template | (echo 'cat << EE';sed 's/<date>/$(date)/g'; echo 'EE') | bash

Folder Listings ‹↑›

list only folders

 ls -d */ | (echo '<ul class="fol">'; cat -; echo '</ul>')

make an html directory listing out of the current folder

 echo "echo -e \<li\>"{$(echo * | tr ' ' ',')}"\</li\>" | bash

list all files and folders, no links

 a=$(echo *); echo 'echo -e "\n<li>"{'${a//" "/,}'}"</li>"' | bash

list only folders, no links

 a=$(echo */); echo 'echo -e "\n<li>"{'${a//" "/,}'}"</li>"' | bash

a for loop method to list only sub-folders as an html list

    echo "<ul class=fol>"
    for d in $(ls -d */); do
      echo "<li>$d</li>"
    echo "</ul>"

another for loop method to list only sub-folders as an html list

    echo "<ul class=fol>"
    for d in */; do
      echo " <li>$d</li>"
    echo "</ul>"

list subfolders as html links

    echo "<ul class=fol>"
    for d in */; do
      echo "<li><a href='$d'>$d</a></li>"
    echo "</ul>"

list subfolders as html links using a brace loop

    echo "<ul class=fol>"
    for d in */
      { echo " <li><a href='$d'>$d</a></li>"; } 
    echo "</ul>"

Bash Cgi Programming ‹↑›

While it is most common for Cgi web-scripts to be written in the Perl language, it is also possible to write them using the normal Bash shell scripting language. Whether this is a good idea is completely another question...

www: http://en.wikipedia.org/wiki/Internet_media_type#List_of_common_media_types
A list of common "media types" (such as "text/html") which are used in the "Content-Type:" field of the Cgi script.
o- use the "2>&1" idiom at the end of script lines to redirect an error message to the "standard output" (which in the case of a Cgi script is the web-browser of the script visitor). This allows you, the developer to see what is going wrong with your bash cgi script. - using "here" documents with a bash cgi script is a simple way to produce content. -

The Bash Cgi Gotchas ‹↑›

o- the content-type line has to be before /anything/ or else nothing is printed.

there must be an empty line after the "content-type" line.

    echo "Content-Type: text/html"
    echo "<html>...</html"
    # This is Incorrect because there is no blank line after the content type

trying to call the perl CGI module twice Doesnt work

      file=$(perl -e "use CGI qw(:standard); print param('file'))
      name=$(perl -e "use CGI qw(:standard); print param('name'))

      # Incorrect, the "name" variable is not set because the 
      # 'cgi' module reads all the form posted data from standard input
      # in one go, leaving nothing to read the second time around.

Cgi Bash Examples ‹↑›

make a cgi script to display the query string

   echo "Content-Type: text/html"
   echo "<html><body>the query string is $QUERY_STRING</body></html>" 2>&1

a bash cgi script indicating that the character set is "utf8"

  echo "Content-Type: text/html; charset=utf-8"
  echo "<html><body>A Bash UTF8 Cgi Script!</body></html>" 2>&1

show error messages in the browser generated by a cgi script line

 ech "this is a mistake" 2>&1

show error messages in the browser with output redirection

 ech "this is a mistake" 2>&1 >save.txt
note that the 2>&1 should come before the file redirection

a cgi script which displays several environment variables

     cat << ENDxxx 
       Content-Type: text/html; charset=utf-8 

       <!-- The empty line above this one is essential -->
         <head><title>A bash cgi script</title></head>

Getting And Decoding Form Data ‹↑›

www: http://oinkzwurgl.org/bash_cgi
bash functions for decoding cgi form data
www: http://www.fpx.de/fp/Software/ProcCGIsh.html
a bash script and c program for decoding cgi form data. The c program needs to be compiled.
Data sent from an html form to a web server can be sent in 2 different ways; in the querystring itself (the 'GET' method) and in the HTTP headers (the 'POST' method). If the data is 'posted' then the cgi script will receive the data on the standard input Data sent from an html form has to be url decoded.

data "posted" from an html form can be read from the standard input

 read postdata   the content length should first be checked
 postdata=$(</dev/stdin)  the same

use perl to get and decode data posted from an html form

 file=$(perl -e "use CGI qw(:standard); print param('file'))
the CGI module takes care of url-decoding the form data

tools for file transfer over the net
ftp - the old file transfer tool
rsync - transfer only changed or new files
sftp - an interactive secure version of sftp
scp - a non-interactive secure ftp
sitecopy - synchronize a remote site with what is local

some environment variables and example values

     SERVER_SOFTWARE = Apache/2.0.54 (Fedora)
     SERVER_NAME = www.comp.leeds.ac.uk
     SERVER_PORT = 80
     HTTP_ACCEPT = 'text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5'
     PATH_INFO = 
     SCRIPT_NAME = /cgi-bin/Perl/environment-example

cgi environment variables and meanings
DOCUMENT_ROOT The root directory of your server
HTTP_COOKIE The visitor's cookie, if one is set
HTTP_HOST The hostname of your server
HTTP_REFERER The URL of the page that called your script
HTTP_USER_AGENT The browser type of the visitor
HTTPS "on" if the script is being called through a secure server
PATH The system path your server is running under
QUERY_STRING The query string (see GET, below)
REMOTE_ADDR The IP address of the visitor
REMOTE_HOST The hostname of the visitor (if server has reverse-name-lookups on; otherwise this is the IP address again)
REMOTE_PORT The port the visitor is connected to on the web server
REMOTE_USER The visitor's username (for .htaccess-protected pages)
REQUEST_URI The interpreted pathname of the requested document or CGI (relative to the document root)
SCRIPT_FILENAME The full pathname of the current CGI
SCRIPT_NAME The interpreted pathname of the current CGI (relative to the document root)
SERVER_ADMIN The email address for your server's webmaster
SERVER_NAME Your server's fully qualified domain name (e.g. www.cgi101.com)
SERVER_PORT The port number your server is listening on
SERVER_SOFTWARE The server software you're using (such as Apache 1.3)

Xml And Html ‹↑›

xml and html are closely related mark-up languages.

Xhtml ‹↑›

cgi Content-Type: values
text/html for html content
text/plain for plain text
text/plain; charset=utf-8 plain text in the utf8 encoding
text/xml xml text

Validation ‹↑›

Validation is the process of confirming that a particular html, xhtml, or css document actually conforms to the technical specifications for that document. Historically browsers have been designed to display more or less correctly even html and xhtml pages which contain 'mistakes' or syntax which does not conform to the technical specification for that type of document.

www: http://validator.w3.org
validates xhtml
www: http://jigsaw.w3.org/css-validator
a css validator
www: tidy.sourceforge.net
A command line html validator and pretty printer.
install the html tidy program and docs on a debian type linux
 sudo apt-get tidy tidy-doc

view the html tidy documentation

 ls /usr/share/doc/tidy-doc/htmldoc
 xdg-open /usr/share/doc/tidy-doc/htmldoc/Overview.html

test out tidy printing errors and warnings

 echo "<html></html>" | tidy -e - 2>&1 | less

Curl Stuff ‹↑›

getting a page via an authenticating proxy server

 curl -x proxy.utas.edu.au:8080 -U bobj http://www.server.net

get a page via authenticating proxy server as user 'bob' password 'sec'

 curl -x proxy.net:8080 -U bob:sec http://www.server.net

Supplying the password in this manner is possibly not a good idea from a security point of view

download a text file through a proxy and edit it with vim

  function edn 
    curl -x proxy.org.au:8080 -U bob:pass www.serv.net/a.txt -o ~/notes.txt
    vim ~/notes.txt

upload a file to webserver w cgi script with http authententication,

function up { [ -z "$1" ] && echo 'no parameter' && return 1; curl -F name='../htdocs/notes/'$1 -F contents='<'$1 -u user:upass -x prox.net:8080 -U bob:proxpass http://serv.net/save.cgi } ,,,

Perl Tricks ‹↑›

xmlstarlet - queries and edits xml from the command line

a perl mechanize example

    # navigate to the main page

    # follow a link that contains the text 'download this'
    $mech->follow_link( text_regex => qr/download this/i );

    # submit a POST form, to log into the site
    with_fields      => {
    username    => 'mungo',
    password    => 'lost-and-alone',

    # save the results as a file

Php ‹↑›

Testing php configuration

 php -r "phpinfo\(\);"

get the urls from a webpage


Crawling ‹↑›

Load Testing ‹↑›

useful modules
ScriptableBrowser (simpletest)

Stuff Used In Google Chrome ‹↑›

funkload - web testing

Forum Sites ‹↑›

www: ask.metafilter.com
visited by knowledgable people @@

Serving Pages ‹↑›

share the current directory tree (via http) at http://$HOSTNAME:8000/

 python -m SimpleHTTPServer

create a webserver to share all files in /tmp/mydocs on port 8081

 wbox servermode webroot /tmp/mydocs

create a webserver to share all files in /tmp/mydocs on port 8080

 wbox servermode serverport 8080 webroot /tmp/mydocs

Sharing file through http 80 port

 nc -w 5 -v -l -p 80 < file.ext

Cgi Servers ‹↑›

a simple cgi server

 python -m CGIHTTPServer 8080

some kind of perl cgi server


Small Webservers ‹↑›

www: http://ask.metafilter.com/65481/Help-me-find-a-cool-little-unix-http-utility-I-cant-remember
some good things about mini web servers
www: http://hping.org/wbox/
site for wbox
tools used in google chrome
JSCRE libjpeg, libpng, libxml, libxslt, LZMA SDK, modp_b64,
Mozilla interface to Java Plugin APIs
npapi nspr, nss, Pthreads for win32, sqlite,
tlslite V8 assembler, WebKit, WTL, zlib

Wbox ‹↑›

show how long each part of a webpage takes to generate

 wbox nowhere.net/page.html timesplit 1

show the http header information for a page

 wbox www.google.it/notexistingpage.html 1 showhdr

Mac Osx ‹↑›

Installed by default are php, curl,

Templating ‹↑›

www: http://www.perl.com/pub/a/2001/08/21/templating.html
an article about using templating systems with perl
small quick and easy webservers
wbox - http server
thttpd - small web server
mini_httpd - same author as thttpd but smaller
webfs - serves a file system from the web
busybox httpd command - small webservery thing

www: http://template-toolkit.org/
the site for the template toolkit

Template Toolkit ‹↑›

example statement using dot notation.

 How are things in [% customer.address.city %]?

a for loop [% FOREACH list %] <a href="[% url %]"><b>[% name %]</a></a> [% END %] ,,,

Html Template ‹↑›

example loop with html::template ----- <TMPL_LOOP list> <a href="<TMPL_VAR url>"><b><TMPL_VAR name></b></A> </TMPL_LOOP> ,,,

Mason ‹↑›

Seems to be in active development. It can be run without a webserver

www: http://www.masonhq.com/
the official site
install mason using apt-get
 apt-get install libmason-perl  ??? unchecked

Perl Stuff ‹↑›

Template Toolkit - almost active development perl, and python
HTML::Mason - ? callback style active as of 2010
Embperl - embedd perl into webpages stopped 2006
HTML::Template - perl template module
Text::Template - a general purpose templater
Apache::ASP - use asp with apache stopped 2004
CGI::FastTemplate - another one

Web Simple ‹↑›

www: http://search.cpan.org/~mstrout/Web-Simple-0.002/lib/Web/Simple.pm
the documentation for web simple
Developed in 2009. Can create a webapplication without a webserver

www: hobbs
at stackoverflow.com knowledgable perl web person

Cpan Tool Crash Course ‹↑›

start a cpan shell to install mason

 perl -MCPAN -e 'shell'

some interesting perl web modules
Dancer - perl web apps with good examples
Web::Simple - simple web apps
Catalyst - big web apps

www: http://www.livejournal.com/doc/server/lj.install.perl_setup.modules.html
a list of potentially useful modules used with livejournal
upgrade cpan but could cause problems ???
    perl -MCPAN -e shell
    cpan> install Bundle::CPAN
    cpan> reload cpan

Getting Web Pages From The Command Line ‹↑›

http get of a web page via proxy server with login credentials

 curl -U username[:password] -x proxy:proxyport webpage

use netcat to get a webpage

 echo "GET / HTTP/1.0\r\r" | nc -v www.somewebsite.com 80

get a webpage with the console php tool

 php -r "file('http://metafilter.com/');"

get a webpage with perl

 perl -MHTTP::Client -e 'print HTTP::Client->new()->get("http://localhost/path")'

a simpler way to do the same

 perl -MLWP::Simple -e 'get("http://www.metafilter.com/")'

Load Testing ‹↑›

perform 4 queries per second (with 4 processes) on the local webserver

 wbox http://localhost clients 4

Blog Engines ‹↑›

simple cpan shell
h - show help
get - get the source for a module
make - compile ? the module
test - test a module
install - do all of get make, test
clean - get rid of a badly installed module
look - see whats happening
readme - see whats going on

Cms Systems ‹↑›

serendipity - a blog engine with database backend
bilboblog - a simple engine

Notes ‹↑›

google use python

www: http://unixmages.com/
an unrelated but interesting site

Jargon ‹↑›

a small summary
redmine - popular small software orientated (ruby
www: the
net and the web The net is the internet and all its protocols such as ftp, ssh, http etc where as the web consists of just the http and https protocol and therefor webpages.
www: download
transfer files from a net server to your local computer
www: markup
a way of embedding extra information in a document with the use of 'tags' or 'codes'. Html is an example of markup.
www: minimalist
mark-up minimalist mark-up is a markup language which attempts to use the minimum number of tags possible to embedd in the document. The purpose of this system is not to interfere psychologically with the human readers and writers of the document who may be distracted and confused by bulky and numerous tags. An example of minimalist markup is 'markdown' or most 'wiki' markup languages
www: wiki
Is a web site or document which many or most web-visitors are able to edit.
www: web
2.0 The term encapsulates the idea of the web as a collaborative medium for the compilation and consumption of information, rather that a 'uni-directional' information source. The television is an example of a uni-directional information source since the viewer is unable to effect its transmissions, apart from changing channel or switching off. The web 2.0 is multi-directional because the user can create and modify the content (through the use of wikis, and collaborative sites).
www: tags
mark-up languages often use 'tags' in order to embedd information in text documents. Usually there is a start tag and an end tag which surrounds the information to which it pertains. For example in the text "<em>tree</em>" the word 'tree' is surrounded by the start and end html tag "<em>". This system is used by various mark-up languages such as html, xml and sgml.
www: semantic
markup the use of mark-up codes or tags to indicate 'semantic' information rather than 'visual display or layout' information. The phrase 'semantic information' may be a tautology. An example of 'visual display' markup are the html <big> and <small> tags which indicate to the document rendering software the text should be displayed bigger or small than the normal sized text. Semantic markup is considered to be 'good' in the w3c world and among ideologists, since information is not lost in a forest of meaningless layout tags.
www: plain
text plain text is the basis of the internet and web. Plain text is any data which represents a stream of characters in a human writing system which is encoded electronically according to a standard 'text encoding'. An example of these encoding are utf8, ascii, or latin1. Modern unicode text encodings (such as utf8) can represent any character in any human language which has an established writing system
www: separation
of content and presentation a much bandied about phrase.
www: style
sheets documents which indicate how other documents should be displayed. A number of style sheet languages exist but the most common is 'css' cascading style sheets.
www: cms
- content management system A content management system is supposedly a way to quickly build websites focussing on the content of that site rather than web-development issues and design issues. There are a dizzying array of free and open-source cms systems available
www: version
control system is a way of managing changes to files made by potentially many people, simultaneously
www: database
a way of storing data which is supposedly secure and convenient. many database systems exist with the simplest being a text file database, and the most command being a 'relational' database which stores information in a series of linked tables. Databases are often used to store the content of web-sites
www: content
This is a much used web-term which relates to the information which a web-site contains as opposed to the visual or technical design of the site


web jargon