&& The Linux Operating System -----------------------------: Quote: "I'm doing a (free) operating system, just a hobby,..." (Linus Torvalds) This booklet is designed to help with common tasks on a Linux system. It is designed to be presentable as a series of "recipes" for accomplishing common tasks. These recipes consist of a plain English one-line description, followed by the Linux command which carries out the task. The document is focused on performing tasks in Linux using the 'command line' or 'console'. The format of the booklet was largely inspired by the "Linux Cookbook" www.dsl.org/cookbook @@ http://github.com/himanshuc/nixhacker/tree/master a good list of resources @@ http://www.pixelbeat.org/cmdline.html some command lines recipes @@ http://www.shell-fu.org/ some command line recipes @@ http://www.commandlinefu.com/ a very good site with lots of command-line tips @@ http://dsl.org/cookbook/cookbook_toc.html A very good Linux User "cookbook" and the primary inspiration for the booklets on this site. GOOD BOOKS @@ The UNIX Environment (Andrew Walker, Wiley 1984) @@ The Linux Cookbook (Michael Stutz, No Starch Press) @@ The Unix Programming Environment (Kernighan et al) ONLINE BOOKS .... @@ http://www.catb.org/~esr/writings/taoup/html/ @@ http://www.faqs.org/docs/artu/ a philosophical book about the unix operating system by Eric Raymond (2003) @@ http://www.linfo.org/onlinebooks.html a list of online linux books GETTING HELP The traditional Unix help system is called 'man' or 'manual' pages. And they can be good. It is one of the ironies and frustrations of Unix that a man page only really becomes helpful and interesting once one already knows what a program does and how to basically use it. * show the short help description for all programs >> whatis -r '.*' >> for i in $(ls $(echo $PATH | tr ':' ' ')); do whatis $i; done | less * search for all programs which have something to do with 'java' >> whatis -r '.*' | grep java >> whatis -r 'java' >> whatis -r java * view the 'manual' page for the wc (word/line/character count) command >> man wc Sadly, 'man' pages are often written in a cryptic way and are short on examples. The examples, if there are any, are almost always right at the end of the man page. * view the manual page in 'section' 4. See the list of sections, elsewhere >> man 4 command * show what software is available to install, relating to 'page layout' >> apt-cache search "page layout" >> apt-cache search "page layout" | grep -v '^lib' ##(exclude code libraries)) * Search all man pages for the word 'dig' >> man -k dig ##(does this search just the short description??) * find documentation for latex packages in pdf format >> find /usr/share/doc/ -name '*.pdf' ##(on a debian system, at least) AN INTRODUCTION TO THE LINUX COMMAND LINE UNIX CULTURE SHOCK .... Those users who have no experience with a Unix-style operating system, but are familiar with the Microsoft Windows operating system will experience a kind of 'culture shock' when they begin to use Linux. This is true even if the user is familiar with using the 'command-line' on a Microsoft computer. This section attempts to point out some of the things which may seem baffling and illogical to the new Linux user. @@ You cant just download an 'exe' and click on it to run it However you can easily install new software with 'sudo apt-get install ' @@ file names dont have to have extensions (such as .txt), but can To an experienced Microsoft command line user, the idea of just calling a text file 'report' rather than 'report.txt' is rather disorientating. "How will the computer know what type of file it is?" The microsoft user thinks. File names can and do begin with a dot '.' File names can begin with a tilde '~' @@ Linux commands are very short and cryptic Why is the 'list files' command, called 'ls' and not 'list'? It would be so much more memorable as 'list' than 'ls'. This is the "Unix way". If you like typing you can do alias list='ls' @@ The folder hierarchy in Linux seems very cryptic and you cant just put files anywhere BASIC USER COMMANDS .... * log in to the system with a username of 'bob' >> ubuntu login: bob * log out of the system >> logout * switch to the fourth virtual console >> press [ALT]-[F4] * switch from the fourth to the third virtual console, press: >> [ALT]-[<-] * switch from X to the first virtual console, press: >> [CTRL]-[ALT]-[F1] * run the hostname tool to find the name of the computer >> hostname * output the version of the hostname tool >> hostname --version * run hostname and specifying that the file 'host.info' be read from >> hostname -F host.info * change your password >> passwd * output your username >> whoami * see who is currently logged in >> who * see who is currently logged in and what they are doing >> w * display information about recent use of the system >> last | less * find out when the user 'mjb' last logged in >> last mjb NOTE: The last tool gets its data from the system file '/var/log/wtmp'; the last line of output tells how far this file goes back. BASIC PROCESS COMMANDS .... * list the processes in your current shell session >> ps * list all the processes that user 'hst' has running on the system >> ps -u hst This command is useful for listing all of your own processes * list all of the processes and give their user-names >> ps aux ##(there could be a lot of output, even on single user systems) * display a continually updated display of the current system processes >> top * list all the processes containing a reference to an 'sbin' directory >> ps aux | grep sbin * list any processes whose process IDs contain a 13 in them >> ps aux | grep 13 * list the process whose PID is 344 >> ps -p 344 BASIC SOFTWARE COMMANDS .... * Run skype using your GTK theme >> skype --disable-cleanlooks -style GTK * output a list of programs that pertain to consoles >> apropos consoles * output a list of all tools whose pages in the system manual contain a reference to consoles >> man -k consoles * list all of the software packages installed on the system >> dpkg -l * list all of the packages whose name or description contains the text "edit," regardless of case >> dpkg -l | grep -i edit * peruse descriptions of the packages that are available >> less /var/lib/dpkg/available * get a description of the who tool >> whatis who * view the manual page for w >> man w * view all of the Info manuals on the system >> info * read the Info documentation for the tar tool >> info tar This command opens a copy of The GNU tar Manual in info. To read the contents of a file written in Info format, give the name of * read 'faq.info', an Info file in the current directory >> info -f faq.info * read 'faq.info', an Info file in the current directory, beginning with the node Text >> info -n 'Text' -f faq.info * view the HTML version of the Debian FAQ in the lynx Web browser >> lynx /usr/share/doc/debian/FAQ/debian-faq.html The above only works on a Debian flavour of Linux. * view the compressed text version of the Debian FAQ in zless, >> zless /usr/doc/debian/FAQ/debian-faq.txt.gz BASIC COMMAND LINE USAGE .... * repeat the last command entered >> [^|] The [^|] key moves the last command you typed back to the input line, and executes it. * retrieve the last command you entered with the string 'grep' in it >> ##(press the key and the 'r' key) (reverse-i-search)'': grep * put the 3rd-last command you entered with 'grep' in it on the input line >> [Control]-r (reverse-i-search)'': grep * clear the screen and then log out of the system >> clear; logout * run the hostname command twice times >> hostname; hostname * redirect standard input for apropos to file 'keywords' >> apropos < keywords * redirect standard output of a command to the file 'commands' >> apropos shell bash > commands * append the standard output of apropos shells to the file 'commands' >> apropos shells >> commands * redirect the standard error of apropos shell bash to 'command.error' >> apropos shell bash 2> command.error * perform a long task in the background, saving all messages to 'img.txt' >> find / | xargs file | grep image &>~/img.txt & In the command above, both error messages (2>) and all normal output of the command will be redirected to the 'img.txt' text file in the users home folder. * append the error output of a command to an existing file 'command.error' >> apropos shells 2>> command.error * redirect the standard output and standard error to the file 'commands' >> apropos shells &> commands * view the output of "apropos bash shell shells" in a 'pager' program >> apropos bash shell shells | less A pager program allows you to view output from a program one (screen) page at a time. (pagers: more/less/most) * run the command apropos shell > shell-commands as a background job >> apropos shell > shell-commands & * run job 4 in the background >> bg %4 Trivia: running a job in the background is sometimes called "backgrounding" or "amping off" a job. * bring the most recent background job to the foreground >> fg * bring job 3 to the foreground >> fg %3 * list your jobs >> jobs * kill job number 2 >> kill %2 * to interrupt a running command use [control] c >> find / -name '*e*' >> [control] c * search your command history for the text 'apropos' >> history | grep apropos * specify the second-to-the-last command in your history >> [^|] [^|] Trivia: '!', the exclamation mark is sometimes called "bang" * run history event number 1 (the last command executed) >> !1 * create a script of a shell session and save it to the file 'log.1' >> script log.1 SIMPLE FILE COMMANDS .... * create the file 'new.txt' in the current directory >> touch new.txt * create the file 'new' in the 'work/docs' subdirectory of the current directory >> touch work/docs/new On a Unix-style computer, such as Apple OSX or Linux, files do not have to have 'extensions' to their names, unlike on Microsoft Windows. * make a new directory called 'work' in the current working directory >> mkdir work * create the 'work/completed/2001' directory >> mkdir -p work/completed/2001 >> mkdir --parents work/completed/2001 ##(the same) If the 'work' and 'completed' folders do not exist, then they will be created. SIMPLE FOLDER COMMANDS .... * change the current working directory to '/usr/doc' >> cd /usr/share/doc * return to the directory you were last in >> cd - * determine what the current working directory is >> pwd * list the contents of 'work', a subdirectory in the current directory >> ls work * list the contents of the '/usr/doc' directory >> ls /usr/doc * list the contents of the directory so that directories and executables are distinguished from other files >> ls -F * output a verbose listing of the '/usr/doc/bash' directory >> ls -l /usr/doc/bash * output a recursive listing of the current folder and sub-folders >> ls -R * list all of the files on the system >> ls -R / * list the files in the 'log' directory sorted with newest first >> ls -t /var/log The '/var/log' folder usually contains 'log' files which record the activities of software running on the computer. Type 'man hier' for more information about the Linux folder structure. * list all files in the current directory (including 'hidden' files) >> ls -a == important 'ls' options .. -a - display all files, including hidden ones .. -F - display the file types of files .. -R - list all folders and subfolders .. -t - sort the displayed files by date .. * output a tree graph of your home directory and all its subdirectories >> tree ~ ##(this shows files as well as folders) * show a just the start of a folder tree for the home folder >> tree -d /usr/local | head -20 [>>0.3 image/eg-tree.png] * peruse a tree graph of the '/usr/local' directory tree >> tree -d /usr/local | less COPYING FILES .... * copy the file 'old' to the file `new' >> cp old new * copy files preserving the file attributes >> cp -p file.txt new-copy.txt * copy the folder 'reports' verbosely (showing what is being done) >> cp -vr reports ~/new/ * copy a folder tree verbosely, but only copy files which are newer >> cp -vur docs/office ~/docs * copy the folder 'public_html', and subfolders, to 'private_html' >> cp -R public_html private_html The cp '-R' option doesnt copy symbolic links. But the "man" page for cp states that -r and -R are equivalent * make an archive copy of the directory tree 'public' to 'private' >> cp -a public_html private_html MOVING FILES .... * move the file 'notes' in the current working directory to '../play' >> mv notes ../play * move the file '/usr/tmp/notes' to the current working directory, >> mv /usr/tmp/notes . This command moves the file '/usr/tmp/notes' to the current working * move the directory 'work' in the current working directory to 'play' >> mv work play * rename the file 'notes' to `notes.old' >> mv notes notes.old NOTE: to rename multiple files use 'rename' DELETING FILES .... * remove the file 'notes' in the current working directory >> rm notes * remove the directory 'waste' and all of its contents >> rm -R waste * remove the directory 'empty' >> rmdir empty * use tab completion to remove the file 'No Way' in the current directory >> rm No[TAB] Way * delete the file '^Acat' in a directory that also contains the files 'cat' and `dog' >> rm -i ?cat ##(rm: remove '^Acat'? y ) * remove the file '-cat' from the current directory >> rm -- -cat SYMBOLIC LINKS .... * create a hard link from 'seattle' to `emerald-city' >> ln seattle emerald-city * create a symbolic link from 'seattle' to `emerald-city' >> ln -s seattle emerald-city * list all files in the '/usr/bin' directory that have the text 'tex' anywhere in their name >> ls /usr/bin/*tex* * copy all files whose names end with '.txt' to the `doc' subdirectory >> cp *.txt doc * output a verbose listing of all files whose names end with either a '.txt' or '.text' extension, sorting the list so that newer files are listed first >> ls -lt *.txt *.text * remove all files in the current working directory that begin with a hyphen and have the text 'out' somewhere else in their file name >> rm -- -*out* * join text files whose names have an 'a' followed by 2 or more characters >> cat a??* The original files are unchanged, but the joined together text files are displayed on the screen. THE BASH PROMPT .... * change your shell prompt to 'Your wish is my command: ' >> PS1='Your wish is my command: ' * change your prompt to the default bash prompt (the current folder) >> PS1='\w $ ' * change the prompt to the current date, space, the hostname, and a '>' >> PS1='\d (\h)>' * clear the screen every time you log out, >> clear ##(put this in the file '.bash_logout') INSTALLING SOFTWARE * search for games software which are available to install >> apt-cache search game * update the repository cache >> sudo apt-get update Add new repositories to the file '/etc/apt/sources.list' SOUND A linux system is capable of playing, recording and editing sound (audio) files with a variety of open-source software. This section only provides what should be a very succint overview of the most important sound tasks with linux. For more detailed information, please consult the booklet listed below. @@ http://bumble.sourceforge.net/books/linux-sound/ A more comprehensive introduction to using audio with the linux operating system. * Synthesize text as speech >> echo "hello world " | festival --tts * find the duration of the audio file 's.wav' in hours/minutes/seconds >> soxi -d s.wav >> soxi s.wav | grep -i duration ##(the same) RECORDING AUDIO .... * record a 'wav' file from the microphone, saving it to 'hello.wav' >> rec hello.wav This begins an 8,000 Hz, monaural 8-bit WAV recording, which is not very good quality. * make a high-fidelity recording from the mic and save it to 'goodbye.wav' >> rec -s w -c 2 -r 44100 goodbye.wav >> rec -s w -c 2 -r 44100 goodbye.wav PLAYING AUDIO .... * play the MP3 stream at the url >> mpg321 http://example.net/broadcast/live.mp3 CONVERTING AUDIO FILE FORMATS .... It is a common task to need to convert sound files from one format to another. * translate an audio file in Sun AU format to a Microsoft WAV file, >> sox recital.au recital.wav * convert 'sound.mp3' into a wav file 'new.wav' (a new file is created) >> mpg321 -w new.wav old.mp3 ##(the file 'old.mp3' is unchanged) >> mpg123 -w new.wav old.mp3 ##(the same) * encode an MP3 file from a WAV file called 'september-wind.wav' >> lame september-wind.wav september-wind.mp3 EDITING SOUND .... == tools for editing sound .. audacity - a good graphical sound editor .. sox - a command line audio editor .. * join the audio files 'a.wav' and 'b.wav' together and save as 'new.wav' >> sox a.wav b.wav new.wav CONVERTING AUDIO FILES .... TRANSLATION == translation tools .. youtranslate - uses web services such as google .. UNICODE * Find UTF-8 text files misinterpreted as ISO 8859-1 due to Byte >> find . -type f | grep -rl $'\xEF\xBB\xBF' * show the current locale (language and character encoding) >> locale * show a hexdump of a text file >> hd file.txt >> hexdump file.txt #(the format is a little different) TEXT FILE ENCODINGS * Convert a file from ISO-8859-1 (or whatever) to UTF-8 (or >> tcs -f 8859-1 -t utf /some/file * Convert filenames from ISO-8859-1 to UTF-8 >> convmv -r -f ISO-8859-1 -t UTF-8 --notest * * Detect encoding of the text file 'file.txt' >> file -i file.txt ##(-i is the 'mime' switch, but it also shows encoding) * convert file from utf8 (no bom) to utf16 (with 'bom') >> recode UTF8..UTF-16LE linux-utf8-file.txt * convert all '.php' files to the utf-8 text encoding >> find . -name "*.php" -exec iconv -f ISO-8859-1 -t UTF-8 {} -o ../newf/{} \; * find utf-8 encoded text files misinterpreted as iso 8859-1 >> find -type f | while read a;do [ "`head -c3 -- "${a}"`" == $'\xef\xbb\xbf' ] && echo "match: ${a}";done * Fix UTF-8 text files misinterpreted as ISO 8859-1 due to Byte >> perl -i -pe 's/\xef\xbb\xbf//g' * Convert file type to unix utf-8 >> ex some_file "+set ff=unix fileencoding=utf-8" "+x" * Convert one file from ISO-8859-1 to UTF-8. >> iconv --from-code=ISO-8859-1 --to-code=UTF-8 iso.txt > utf.txt SPELL CHECKING == spell checking programs .. spell, a non interactive spell checker .. ispell, a veteran program .. aspell, the gnu version .. myspell, the open-office spell checker .. hunspell, based on ispell .. spellutils, debian package to selectively spell check == editors with spell checking .. vim, type ':set spell' to activate, and 'z=' to correct .. emacs, .. * search for all debian packages which have something to do with spelling >> apt-cache search spell * spell check the file 'lecture' >> spell lecture prints a list of badly spelled words * print all misspelled words in all ".txt" files with line numbers >> spell -n -o *.txt * spell check the file 'ch.1.txt', with misspellings to the file 'bad.sp' >> spell ch.1.txt > bad.sp * check the spelling of a word on the command line >> echo 'is this korrect ?' | spell This prints 'Korrect' since it is badly spelled * output a sorted list of the misspelled words from 'lecture.draft' >> spell lecture.draft | sort | uniq ISPELL .... 'ispell' is an older and simpler program than 'aspell' * interactively spell check 'report.txt' >> ispell report.txt * install a British English dictionary for the "ispell" spell checker >> sudo apt-get install ibritish * check and correct the spelling interactively in document "report.txt" >> ispell report.txt ##(when a misspelling is found, type the number of the replacement) * spell check "file.txt" using a british english dictionary >> ispell -d british file.txt * spell check a document written in spanish (using a spanish dictionary) >> ispell -d spanish archivo.txt * show what dictionaries are available locally for ispell >> ls /usr/lib/ispell/ * the ispell dictionaries are all called "i[language-name]" >> dictionary files: icatalan, ibrazilian ... * spell check and correct "thesis.tex" which is a LaTeX format document >> ispell -t thesis.tex ##(ispell ignores the latex mark-up codes) ASPELL aspell is a more modern and capable spell checking program @@ http://aspell.net/ the official site @@ http://aspell.net/man-html/index.html A usage manual for aspell * show options for aspell and available dictionaries >> aspell help | less * show locally available dictionaries for aspell >> aspell dicts * install a British and American English dictionary for aspell >> sudo apt-get install aspell-en * install a spanish dictionary for aspell >> sudo apt-get install aspell-es * show all debian packages and dictionaries for aspell >> apt-cache search aspell * interactively check the spelling of the file "chapter.txt" >> aspell -c chapter.txt >> aspell check chapter.txt ##(the same) ASPELL WITH OTHER LANGUAGES .... * check the spelling of "chapter.txt" using British English spelling >> aspell -d british -c chapter.txt >> aspell -d en_GB -c chapter.txt ##(this is the same) * check the spelling of "chapter.txt" using a Spanish dictionary >> aspell -d spanish -c chapter.txt >> aspell -d es -c chapter.txt ##(this is the same) * check spelling in the comments in the shell script (lines starting with "#") >> aspell --mode=comment -c script.sh ##(!!doesnt work on my version) * checking the spelling in the tex/latex file "chapter.tex" >> aspell -t -c chapter.tex * show available filters for spell-checking particular types of files >> aspell filters >> aspell dump filters ##(the same) * spell check a file skipping (ignoring) lines which start with '>' >> aspell --mode=email check book.txt >> aspell --mode=email -c book.txt ##(the same) >> aspell -e -c book.txt * create a vim "mapping" to use aspell within vim >> map TT :w!:!aspell check %:e! % * spell check a file but only between a "*" character and the end of the line >> aspell --add-filter=context --add-context-delimiters="* \0" -c francisco.txt ##(doesnt really work) ------------------------------------------------------------------ TEXT FILES ------------------------------------------------------------------ VIEWING TEXT FILES .... == text file viewing tools .. less - a text file pager .. most - a more capable pager .. * To print a specific line from a file >> awk 'FNR==5' * set the default pager to be the 'most' program >> update-alternatives --set pager /usr/bin/most * View non-printing characters with cat >> cat -v -t -e * See non printable characters like tabulations, CRLF, LF line >> od -c | grep --color '\\.' LESS .... @@ http://www.greenwoodsoftware.com/less the homepage for less The humble 'less' program is worthy of a second look. Less allows one to peruse and search a text file, but not alter it. I am documenting version 429 (year 2008). Less uses vi-like keys to move around and search. * view the text file 'doc.txt' one screen page at a time >> less doc.txt * view the text file 'days.txt' starting at the end >> less +G days.txt == some common 'less' commands .. [space-bar] - forward one window .. [esc] + [space-bar] - forward one window (with multiple files) .. b - back one window .. j - down one line (the same as 'vim') .. k - up one line (the same as 'vim') .. F - go to end of file and 'follow' new data (like tail -f) .. G - go to the last line of the file .. g - go to the first line of the file .. /pattern - Search forward for (N-th) matching line. .. ?pattern - Search backward for (N-th) matching line. .. n - Repeat previous search (for N-th occurrence). .. N - Repeat previous search in reverse direction. .. ESC-n - Repeat previous search, spanning files. .. v - edit the current file with $VISUAL or $EDITOR .. == some less command line switches .. -i - when searching within less, ignore case, unless search has uppercase .. -I - when searching within less, ignore case. .. -G - dont highlight matches when searching within less .. STARTING LESS .... * view the file 'long.txt' and make searches within less case-insensitive >> less -I long.txt * view the file 'long.txt', with 'semi' case-insensitive searching >> less -i long.txt ##(searches with capital letters are case-sensitive) * make an alias which will make less always semi case-insensitive >> alias less='less -i' * within less turn on or off case-insensitive searching >> -I [enter] * within less see whether searches are case-sensitive or not >> _I * view the output of 'grep' starting at the first line which has 'science' in it >> grep tree forest.txt | less +/science * follow the end of the log file 'tcp.log' showing new data as it enters >> less +F tcp.log ##(this is like 'tail -f' but allows more perusal) * Search for a word in less >> /\bTERM\b * go to the 80% position in the file (that is, 80% towards the end) >> p80 * display less commands >> h LESS WITH MULTIPLE FILES ........ * search multiple files for the text 'tree' >> less *.txt (then type) /*tree LESS BOOKMARKS ........ Less bookmarks work in the same way as 'vi' or 'vim' bookmarks * mark the current top-of-screen position in the text file as bookmark 'x' >> mx ##(any single letter can be used as a bookmark) * jump to the bookmark x >> 'x * save text from current top-of-screen to the bookmark 'x' in file 'save.txt' >> |x cat > save.txt * jump to where you just were (before going to a bookmark) >> '' * edit the current file (but the variable $EDITOR or $VISUAL must be set) >> v ANALYSING LANGUAGE .... DICTIONARIES .... * Look up the definition of a word >> curl dict://dict.org/d:something WORDNET .... * get help for wordnet >> man wnintro >> man wn * show a list of word senses available for the word 'browse', >> wn browse -over * output a list of words from the dictionary that begin with the string 'homew' >> look homew ##(prints something like 'homeward' and `homework' ...) * list words in the dictionary containing the string 'dont' regardless of case >> grep -i dont /usr/dict/words * list all words in the dictionary that end with 'ing' >> grep ing^ /usr/dict/words * list all of the words that are composed only of vowels >> grep -i '^[aeiou]*$' /usr/dict/words * output a list of words that rhyme with 'friend', search '/usr/dict/words' for lines ending with `end': >> grep 'end$' /usr/dict/words * search the WordNet dictionary for nouns that begin with 'homew' >> wn homew -grepn * search the WordNet dictionary for nouns and adjectives that begin with 'homew' >> wn homew -grepn -grepa * list the definitions of the word 'slope' >> wn slope -over * output all of the synonyms (same meaning) for the noun 'break' >> wn break -synsn * output all of the synonyms for the verb 'break' >> wn break -synsv * output all of the antonyms (opposite meaning) for the adjective 'sad' >> wn sad -antsa A hypernym of a word is a related term whose meaning is more general * output all of the hypernyms for the noun 'cat' >> wn cat -hypen Debian 'dict' * check file 'dissertation' for clichés or other misused phrases, type: >> diction dissertation | less * check file 'dissertation' for clichés or other misused phrases, and write the output to a file called 'dissertation.diction' >> diction dissertation > dissertation.diction * If you don't specify a file name, diction reads text from the standard * output all lines containing double words in the file 'dissertation' >> diction dissertation | grep 'Double word' * check the readability of the file 'dissertation' >> style dissertation Like diction, style reads text from the standard input if no text is given * output all sentences in the file 'dissertation' whose ARI is greater than a value of 20 >> style -r 20 dissertation * output all sentences longer than 14 words in the file 'dissertation' >> style -l 14 dissertation * output the number of lines, words, and characters in file 'outline' >> wc outline * output the number of characters in file 'classified.ad' >> wc -c classified.ad Use wc with the '-w' option to specify that just the number of words be * output the number of words in the file 'story' >> wc -w story * output the combined number of words for all the files with a '.txt' file name extension in the current directory >> cat *.txt | wc -w * output the number of lines in the file 'outline' >> wc -l outline * output a word-frequency list of the text file 'naked_lunch', >> tr ' ' '\n' < naked_lunch | sort | uniq -c * output a count of the number of unique words in the text file 'naked_lunch' >> tr ' ' ' > ' < naked_lunch | sort | uniq -c | wc -l * rank the files rep.a, rep.b, rep.c in order of relevance to keywords 'saving' and `profit' >> rel "(saving & profit)" report.a report.b report.c * output a list of any files containing either 'invitation' or 'request' in the `~/mail' directory, ranked in order of relevancy, type: >> rel "(invitation | request)" ~/mail * output a list of any files containing 'invitation' and not 'wedding' in the `~/mail' directory, ranked in order of relevancy, type: >> rel "(invitation ! wedding)" ~/mail * output a list of any files containing 'invitation' and 'party' in the '~/mail' directory, ranked in order of relevancy >> rel "(invitation & party)" ~/mail WRAPPING TEXT LINES .... * format a text file with lines 80 characters long, >> fmt -w 80 textfile ##(short lines lengthened) >> fmt -s -w 80 textfile ##(short lines are not lengthened) * use par instead SPLITTING TEXT FILES .... * split a file into a maximum of 10 files on lines containing * '#200', '#400', '#600' etc with output files called "zz00", "zz01", etc >> csplit -f zz file.txt "/^#1?[24680]00$/" {8} ##(the split occurs 'before' the line containing the match) MERGING TEXT FILES .... * Concatenate lines of to files, one by one >> join file1.txt file2.txt > file3.txt * Merges given files line by line >> paste -d ',:' file1 file2 file3 CONVERTING OTHER FORMATS TO TEXT .... * convert from html to text >> lynx -dump http://url > textfile >> links-dump http://url > textfile ##(may render tables) >> w3m -dump http://url > textfile ##(may tables better) * remove the newline characters from the text file 'autoexec.bat' >> fromdos autoexec.bat >> dos2unix autoexec.bat ##(the same) * add newline characters to all of '.tex' files in the current directory >> todos *.tex >> unix2dos *.tex ##(the same) CONVERTING CHARACTER ENCODINGS .... @@ http://asis.epfl.ch/GNU.MISC/recode-3.6/recode_3.html * Convert encoding of given files from one encoding to another >> iconv -f utf8 -t utf16 /path/to/file * see also iconv (older) * show possible conversions with the 'recode' tool >> recode -l | less * convert latin9 (western europe) character encoding to utf8 >> recode iso-8859-15..utf8 report.txt ##(the actual file is changed) * convert from the local character set to the latin1 encoding saving to "new.txt" >> recode ..lat1 < file.txt > new.txt ##(the original file is unchanged) * convert to html >> recode ..HTML < file.txt > file.html * convert from utf8 to html with verbose output >> recode -v u8..h < file.txt * convert from MS Windows utf8 to the local character set >> recode utf-8/CRLF.. file-to-change.txt COMPARING AND PATCHING TEXT FILES .... The process of 'patching' a text or code file is very important in the world of open-source development (and therefore in the development of Linux itself). Patching allows non-linear changes to be made to a file and is usually used in conjuction with 'diff' * Use colordiff in side-by-side mode, and with automatic column >> colordiff -yW"`tput cols`" /path/to/file1 /path/to/file2 * Compare a file with the output of a command or compare the output >> vimdiff foo.c <(bzr cat -r revno:-2 foo.c) * remote diff with side-by-side ordering. >> ssh $HOST -l$USER cat /REMOTE/FILE | sdiff /LOCAL/FILE - * Diff files on two remote hosts. >> diff <(ssh alice cat /etc/apt/sources.list) <(ssh bob cat /etc/apt/sources.list) * show lines that appear in both file1 and file2 >> comm -1 -2 <(sort file1) <(sort file2) * find the extra lines in file2 >> diff file1 file2 | grep ^> * find the extra lines in file1 >> diff file1 file2 | grep ^< * Compare a remote file with a local file >> ssh user@host cat /path/to/remotefile | diff /path/to/localfile - * Generate diff of first 500 lines of two files >> diff <(head -500 product-feed.xml) <(head -500 product-feed.xml.old) * compare the files 'manuscript.old' and `manuscript.new' >> diff manuscript.old manuscript.new * peruse the files 'olive' and 'green' side by side indicating differences >> sdiff olive green | less * output a difference report for files 'tree', 'bush', and 'hedge', >> diff3 tree bush hedge > arbol * update the original file 'manuscript.new' with the patchfile 'manuscript.diff' >> patch manuscript.new manuscript.diff * Colored diff ( via vim ) on 2 remotes files on your local >> vimdiff scp://root@server-foo.com//etc/snmp/snmpd.conf scp://root@server-bar.com//etc/snmp/snmpd.conf * vimdiff to remotehost >> vimdiff tera.py <(ssh -A testserver "cat tera.py") SEARCHING TEXT .... gnu grep special characters ------------------------------- .. \< - matches beginning of a word .. \> - matches the end of a word .. \b - matches a word boundary .. [:upper:] - matches upper case letters (unicode) .. [:lower:] - matches lower case letters (unicode) .. [:space:] - matches space characters .. ,,, 00- the "--" sequence may be "escaped" in grep. for example: grep "\-\-" file.txt - * search interactively and ignoring case all '.txt' files in this folder >> cat *.txt | less -I ##(then type '/' to search) >> less -I *.txt ##(then type '/*' to search, seems better) * search for lines which begin with "#" in the text file "script" >> grep '^#'