File viewing commands :

We've already looked at less in detail. Here are some other file viewers available on Unix.

cat - short for concatenate, displays the contents of the specified file[s]. All files will be displayed in order given with no pauses. While this can be used to display the contents of a short file to the screen, it is most often used with output redirection (>) to concatenate or join a set of text files into a single file. Although it can be used on files containing binary data, doing so without redirection will most likely cause your screen to become unreadable. If you really need to examine the contents of a binary file, use less or od.

more - utility for viewing a text file one screen at a time from the top. less is modeled after more.

head - display the 1st 10 lines of a file by default. The most useful option for head is -n which allows you to change the number of lines to display. For example to display only three lines, you can use either :

head -n 3 file or head -3 file

Assuming an author has followed good documentation practices, head is useful for looking at a file's introductory documentation.

tail - displays the last 10 lines of a file. Useful options are :

+# display from line #. tail +7 file will start displaying from line 7 of the file on.

-c # displays the last # characters. If the number is preceded by a +, then output will be from then #nth character in the file.

tail is most often used for examining log files. Since data is always being appended to a log and it run 100,000 lines, current information of interest is usually at the end of the file and tail is the command to retrieve it.

od - octal dump is used to examine binary files or files that have non-printable characters in them. od generates a table of values that represent each byte in the file. Left hand column of the table is the count of the 1st byte of data represented on the display line. Unless overridden by an option, od will represent 16 bytes of file data per line. od does allow the user to specify what radix or format the data should be displayed in and multiple radices may be specified. Some of the more useful options are :

-o display octal value (default).

-d display decimal value.

-v display all lines. If od finds long sequences of identical data, it will condense all of the lines to be displayed and represent them with a * This option suppresses that action.

pr - print - pr does not connect to the printer. Rather, pr paginates the specified file for printer output. pr will process a text file and present it so that it can be printed out with headers, page numbers, or other formatting. It assumes an 8 1/2 by 11 page with 56 lines per page. If output is sent to screen, it will not pause between formatted pages. Various options allow you to modify the default layout.

#(The following lists the contents of /dev in single column and pipes that output to pr which formats it in 3 columns with page breaks. The less allows you to view it.)

ls -1 /dev | pr -3 | less

diff - compare two files and list their differences by line number. This is useful when you need to determine what are the differences between two files that should be the same. diff will only generate output of the differences. It will use < and > to indicate which file on the command line it is referring to, a line number or range, and the type of difference such as add, change, or delete.

Some of diff options :

-b treat spacing as same. So "hey&nbs;there" is treated the same as "hey&nbs;&nbs;&nbs;&nbs;there" but not same as "heythere"

-w ignores spacing. "hey&nbs;there" is the same as "hey&nbs;&nbs;&nbs;&nbs;there" and also "heythere"

-q just generates a message that the files are different if that is true.

-e generates difference messages as usable ed commands.

comm - compare two files and generate a 3 column output, 1st column are lines unique to file1, 2nd column are lines unique to file2 and 3rd column are lines common to both.

grep - global regular expression parser. grep search a file or files for all lines containing the text described by a specified regular expression pattern. We will look at regular expressions and grep in detail.

spell - obsolete - doesn't exist on our systems

scans the specified file for unrecognized words. If all words are recognized, then there is no output. Otherwise, it will list words not recognized. The user then has locate them in the text and correct.

ispell - international spell, Linux version of spell. It is interactive and will allow the user to select from possible correction options and will then correct the file. ispell will make a backup of old file under its name with an tilde appended. ispell also allows the user to build a private dictionary as additional support the system default one.

aspell - GNU speller. Interactive. Better at handling UTF8(Unicode). Will make approximate guesses. Users can include additional dictionaries. Words can be added while checking to local dictionary.

aspell -c file - spell checks file. If changes made, original backed up.

The following are more like editors but with very specific tasks :

sort - sort a file in alphabetical order. Input may be from a specified filename or names, by redirection from a file, or if no argument or redirection, directly from the keyboard. All output is sent to standard out unless redirection or the -o option is used.

Although sorting is generally alphabetical, various environmental variables will also affect the sort order such as whether all upper case comes before lower case or the case is folded (all words starting with a 1st, then b, etc.)

sort can also sort in reverse order or on specific columns in the file.

uniq - eliminates repeating consecutive lines in a file. If lines are duplicated elsewhere in the file, these will remain. To remove all duplication, use sort and its -u option, but keep in mind that the order of the lines will be destroyed.

tr - translate the characters in a file. tr generally takes two lists of characters, the 1st list is what to look for and the 2nd is what to change to. The lists are a 1 to 1 match and should have the same number of values.

tr does support range specifications for the lists and even symbolic representations. tr also recognizes several escape (backslash) characters such as horizontal tab (\t) or bell (\a). See the man page for full listing. To specify a range, specify starting character, followed by hyphen, followed by ending character. Character sequence must be in collating order.

You may be explicit : tr "aefgij" "1248AE"

If search list is shorter, only those characters will be replaced. If replacment range is shorter, then last character in replacment list used for matches in the remaining search list.

It is generally safer to quote both ranges.

When using tr, you must use redirection for input and output. Example :

Change ampersands to pluses :

tr "&" "+" < f1 > f1.plus

Change lower case to upper case :

tr "abcdefghijklmnopqrstuvwxyz" "ABCDEFGHIJKLMNOPQRSTUVWXYZ" < p.jcx > p.jcl

tr "a-z" "A-Z" < p.jcx > p.jcl

tr ":lower:" ":upper:" < p.jcx > p.jcl

Some options :

-r takes a single list of characters rather than a matching lists.

-s squeeze repeating consecutive characters if listed.

Often when a file is moved from a Windows and a Unix system, the carriage return, line feed pair is left in the file. Unix systems only require the line feed. The following strips the carriage return.

tr -d "\r" < file1 > file1.clean

tr -s ":punct:" < file1 > file1.good

ul underlining. Designed to convert underscores to underlining based on the terminal being used. One use for this command is to strip out the highlighting and control characters from man pages redirected to files.

Command List