Web Development using Linux

Table of Contents

last revision
27 October 2011, 6:33pm
book quality
just begun


This book seeks to provide recipes for developing web sites using the Linux operating system, with an emphasis on command line tools. This book is not about html and css since that is covered in the html-css-book.txt

Images ‹↑›

Image Compression ‹↑›

Image Resizing ‹↑›

Convert a bunch of HTML files from ISO-8859-1 to UTF-8 file

 for x in `find . -name '*.html'` ; do iconv -f ISO-8859-1 -t UTF-8 $x > "$x.utf8"; rm $x; mv "$x.utf8" $x; done

File Transfer ‹↑›

tools for image compression
webpack -
pngout -

Web Site Mirroring ‹↑›

Download all images from a site

 wget -r -l1 --no-parent -nH -nd -P/tmp -A".gif,.jpg" http://example.com/images

Web Site Upload ‹↑›

use rsync or sitecopy

Posting Data ‹↑›

Submit data to a HTML form with POST method and save the response

 curl -sd 'rid=value&submit=SUBMIT' <URL> > out.html

post with a proxy and authentication

 curl -F name='../htdocs/notes/'$1 -F contents='<'$1 -u user:upass -x prox.net:8080 -U bob:proxpass http://serv.net/save.cgi

Html Stuff ‹↑›

Html Links ‹↑›

get the links from a page

 lynx -dump -listonly www.server.net/page.html

find urls within an html file (most of them anyway)

 egrep 'https?://([[:alpha:]]([-[:alnum:]]+[[:alnum:]])*\.)+[[:alpha:]]{2,3}(: \d+)?(/([-\w/_\.]*(\?\S+)?)?)?'

Entities ‹↑›

encode HTML entities

 perl -MHTML::Entities -ne 'print encode_entities($_)' /tmp/subor.txt

or use xmlstarlet to encode entities.

Bash And Web Development ‹↑›

Using the bash shell to develope websites maybe quite efficient, if unconventional.

possibly the simplest way to create a webpage from text

 cat page.txt | (echo '<html><body><pre>'; cat -; echo '</pre></body></html>')

Templating With Bash ‹↑›

a simple template technique with bash

 export a=b; echo -e 'one\ntwo\nand <x>' | (echo 'cat << EE';sed 's/<x>/$a/g'; echo 'EE') | bash

use the technique above to substitute the date into the template

 cat template | (echo 'cat << EE';sed 's/<date>/$(date)/g'; echo 'EE') | bash

Folder Listings ‹↑›

list only folders

 ls -d */ | (echo '<ul class="fol">'; cat -; echo '</ul>')

make an html directory listing out of the current folder

 echo "echo -e \<li\>"{$(echo * | tr ' ' ',')}"\</li\>" | bash

list all files and folders, no links

 a=$(echo *); echo 'echo -e "\n<li>"{'${a//" "/,}'}"</li>"' | bash

list only folders, no links

 a=$(echo */); echo 'echo -e "\n<li>"{'${a//" "/,}'}"</li>"' | bash

a for loop method to list only sub-folders as an html list

    echo "<ul class=fol>"
    for d in $(ls -d */); do
      echo "<li>$d</li>"
    echo "</ul>"

another for loop method to list only sub-folders as an html list

    echo "<ul class=fol>"
    for d in */; do
      echo "<li>$d</li>"
    echo "</ul>"

list subfolders as html links

    echo "<ul class=fol>"
    for d in $(ls -d */); do
      echo "<li><a href='$d'>$d</a></li>"
    echo "</ul>"

Bash Cgi Programming ‹↑›

While it is most common for Cgi web-scripts to be written in the Perl language, it is also possible to write them using the normal Bash shell scripting language. Whether this is a good idea is completely another question...

www: http://en.wikipedia.org/wiki/Internet_media_type#List_of_common_media_types
A list of common "media types" (such as "text/html") which are used in the "Content-Type:" field of the Cgi script.
o- use the "2>&1" idiom at the end of script lines to redirect an error message to the "standard output" (which in the case of a Cgi script is the web-browser of the script visitor). This allows you, the developer to see what is going wrong with your bash cgi script. - using "here" documents with a bash cgi script is a simple way to produce content. -

The Bash Cgi Gotchas ‹↑›

o- the content-type line has to be before /anything/ or else nothing is printed.

there must be an empty line after the "content-type" line.

    echo "Content-Type: text/html"
    echo "<html>...</html"
    # This is Incorrect because there is no blank line after the content type

trying to call the perl CGI module twice Doesnt work

      file=$(perl -e "use CGI qw(:standard); print param('file'))
      name=$(perl -e "use CGI qw(:standard); print param('name'))

      # Incorrect, the "name" variable is not set because the 
      # 'cgi' module reads all the form posted data from standard input
      # in one go, leaving nothing to read the second time around.

Cgi Bash Examples ‹↑›

make a cgi script to display the query string

   echo "Content-Type: text/html"
   echo "<html><body>the query string is $QUERY_STRING</body></html>" 2>&1

a bash cgi script indicating that the character set is "utf8"

  echo "Content-Type: text/html; charset=utf-8"
  echo "<html><body>A Bash UTF8 Cgi Script!</body></html>" 2>&1

show error messages in the browser generated by a cgi script line

 ech "this is a mistake" 2>&1

show error messages in the browser with output redirection

 ech "this is a mistake" 2>&1 >save.txt
note that the 2>&1 should come before the file redirection

a cgi script which display several environment variables

     cat << ENDxxx 
       Content-Type: text/html; charset=utf-8 

       <!-- The empty line above this one is essential -->
         <head><title>A bash cgi script</title></head>

Getting And Decoding Form Data ‹↑›

www: http://oinkzwurgl.org/bash_cgi
bash functions for decoding cgi form data
www: http://www.fpx.de/fp/Software/ProcCGIsh.html
a bash script and c program for decoding cgi form data. The c program needs to be compiled.
Data sent from an html form to a web server can be sent in 2 different ways; in the querystring itself (the 'GET' method) and in the HTTP headers (the 'POST' method). If the data is 'posted' then the cgi script will receive the data on the standard input Data sent from an html form has to be url decoded.

data "posted" from an html form can be read from the standard input

 read postdata   the content length should first be checked
 postdata=$(</dev/stdin)  the same

use perl to get and decode data posted from an html form

 file=$(perl -e "use CGI qw(:standard); print param('file'))
the CGI module takes care of url-decoding the form data

tools for file transfer over the net
ftp - the old file transfer tool
rsync - transfer only changed or new files
sftp - an interactive secure version of sftp
scp - a non-interactive secure ftp
sitecopy - syncronize a remote site with what is local

some environment variables and example values

     SERVER_SOFTWARE = Apache/2.0.54 (Fedora)
     SERVER_NAME = www.comp.leeds.ac.uk
     SERVER_PORT = 80
     HTTP_ACCEPT = 'text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5'
     PATH_INFO = 
     SCRIPT_NAME = /cgi-bin/Perl/environment-example

cgi environment variables and meanings
DOCUMENT_ROOT The root directory of your server
HTTP_COOKIE The visitor's cookie, if one is set
HTTP_HOST The hostname of your server
HTTP_REFERER The URL of the page that called your script
HTTP_USER_AGENT The browser type of the visitor
HTTPS "on" if the script is being called through a secure server
PATH The system path your server is running under
QUERY_STRING The query string (see GET, below)
REMOTE_ADDR The IP address of the visitor
REMOTE_HOST The hostname of the visitor (if server has reverse-name-lookups on; otherwise this is the IP address again)
REMOTE_PORT The port the visitor is connected to on the web server
REMOTE_USER The visitor's username (for .htaccess-protected pages)
REQUEST_URI The interpreted pathname of the requested document or CGI (relative to the document root)
SCRIPT_FILENAME The full pathname of the current CGI
SCRIPT_NAME The interpreted pathname of the current CGI (relative to the document root)
SERVER_ADMIN The email address for your server's webmaster
SERVER_NAME Your server's fully qualified domain name (e.g. www.cgi101.com)
SERVER_PORT The port number your server is listening on
SERVER_SOFTWARE The server software you're using (such as Apache 1.3)

Xhtml ‹↑›

cgi Content-Type: values
text/html for html content
text/plain for plain text
text/plain; charset=utf-8 plain text in the utf8 encoding
text/xml xml text

Curl Stuff ‹↑›

getting a page via an authenticating proxy server

 curl -x proxy.utas.edu.au:8080 -U bobj http://www.server.net

getting a page via an authenticating proxy server as user 'bobj' with password 'asecret'

 curl -x proxy.net:8080 -U bobj:asecret http://www.server.net

Supplying the password in this manner is possibly not a good idea from a security point of view

download a text file through a proxy and edit it with vim

  function edn 
    curl -x proxy.org.au:8080 -U bob:pass www.serv.net/a.txt -o ~/notes.txt
    vim ~/notes.txt

upload a file to webserver w cgi script with http authententication,

function up { [ -z "$1" ] && echo 'no parameter' && return 1; curl -F name='../htdocs/notes/'$1 -F contents='<'$1 -u user:upass -x prox.net:8080 -U bob:proxpass http://serv.net/save.cgi } ,,,

Perl Tricks ‹↑›

xmlstarlet - queries and edits xml from the command line

a perl mechanize example

    # navigate to the main page

    # follow a link that contains the text 'download this'
    $mech->follow_link( text_regex => qr/download this/i );

    # submit a POST form, to log into the site
    with_fields      => {
    username    => 'mungo',
    password    => 'lost-and-alone',

    # save the results as a file

Php ‹↑›

Testing php configuration

 php -r "phpinfo\(\);"

get the urls from a webpage


Crawling ‹↑›

Load Testing ‹↑›

useful modules
ScriptableBrowser (simpletest)

Stuff Used In Google Chrome ‹↑›

bsdiff, bspatch, bzip2, dtoa, hunspell, ICU, JSCRE, libjpeg, libpng, libxml, libxslt, LZMA SDK, modp_b64, Mozilla interface to Java Plugin APIs, npapi, nspr, nss, Pthreads for win32, sqlite, tlslite, V8 assembler, WebKit, WTL, zlib

Forum Sites ‹↑›

www: ask.metafilter.com
visited by knowledgable people @@

Serving Pages ‹↑›

share the current directory tree (via http) at http://$HOSTNAME:8000/

 python -m SimpleHTTPServer

create a webserver to share all files in /tmp/mydocs on port 8081

 wbox servermode webroot /tmp/mydocs

create a webserver to share all files in /tmp/mydocs on port 8080

 wbox servermode serverport 8080 webroot /tmp/mydocs

Sharing file through http 80 port

 nc -w 5 -v -l -p 80 < file.ext

Cgi Servers ‹↑›

a simple cgi server

 python -m CGIHTTPServer 8080

some kind of perl cgi server


Small Webservers ‹↑›

www: http://ask.metafilter.com/65481/Help-me-find-a-cool-little-unix-http-utility-I-cant-remember
some good things about mini web servers
www: http://hping.org/wbox/
site for wbox
funkload - web testing

Wbox ‹↑›

developed from 2007 - 2009 ...

show how long each part of a webpage takes to generate

 wbox nowhere.net/page.html timesplit 1

show the http header information for a page

 wbox www.google.it/notexistingpage.html 1 showhdr

Mac Osx ‹↑›

Installed by default are php, curl,

Templating ‹↑›

www: http://www.perl.com/pub/a/2001/08/21/templating.html
an article about using templating systems with perl
small quick and easy webservers
wbox - http server
thttpd - small web server
mini_httpd - same author as thttpd but smaller
webfs - serves a file system from the web
busybox httpd command - small webservery thing

www: http://template-toolkit.org/
the site for the template toolkit

Template Toolkit ‹↑›

example statement using dot notation.

 How are things in [% customer.address.city %]?

a for loop [% FOREACH list %] <a href="[% url %]"><b>[% name %]</a></a> [% END %] ,,,

Html Template ‹↑›

example loop with html::template ----- <TMPL_LOOP list> <a href="<TMPL_VAR url>"><b><TMPL_VAR name></b></A> </TMPL_LOOP> ,,,

Mason ‹↑›

Seems to be in active development. It can be run without a webserver

www: http://www.masonhq.com/
the official site
install mason using apt-get
 apt-get install libmason-perl  ??? unchecked

Perl Stuff ‹↑›

Template Toolkit - almost active development perl, and python
HTML::Mason - ? callback style active as of 2010
Embperl - embedd perl into webpages stopped 2006
HTML::Template - perl template module
Text::Template - a general purpose templater
Apache::ASP - use asp with apache stopped 2004
CGI::FastTemplate - another one

Web Simple ‹↑›

www: http://search.cpan.org/~mstrout/Web-Simple-0.002/lib/Web/Simple.pm
the documentation for web simple
Developed in 2009. Can create a webapplication without a webserver

www: hobbs
at stackoverflow.com knowledgable perl web person

Cpan Tool Crash Course ‹↑›

start a cpan shell to install mason

 perl -MCPAN -e 'shell'

some interesting perl web modules
Dancer - perl web apps with good examples
Web::Simple - simple web apps
Catalyst - big web apps

www: http://www.livejournal.com/doc/server/lj.install.perl_setup.modules.html
a list of potentially useful modules used with livejournal
upgrade cpan but could cause problems ???
    perl -MCPAN -e shell
    cpan> install Bundle::CPAN
    cpan> reload cpan

Getting Web Pages From The Command Line ‹↑›

http get of a web page via proxy server with login credentials

 curl -U username[:password] -x proxy:proxyport webpage

use netcat to get a webpage

 echo "GET / HTTP/1.0\r\r" | nc -v www.somewebsite.com 80

get a webpage with the console php tool

 php -r "file('http://metafilter.com/');"

get a webpage with perl

 perl -MHTTP::Client -e 'print HTTP::Client->new()->get("http://localhost/path")'

a simpler way to do the same

 perl -MLWP::Simple -e 'get("http://www.metafilter.com/")'

Load Testing ‹↑›

perform 4 queries per second (with 4 processes) on the local webserver

 wbox http://localhost clients 4

Blog Engines ‹↑›

simple cpan shell
h - show help
get - get the source for a module
make - compile ? the module
test - test a module
install - do all of get make, test
clean - get rid of a badly installed module
look - see whats happening
readme - see whats going on

Notes ‹↑›

google use python

www: http://unixmages.com/
an unrelated but interesting site
serendipity - a blog engine with database backend