Sed - 50 points
Due
Both hard copy and email (email must be in by close of lab).

There are 11 problems in this part of the assignment.

Create a text file that contains a set of sed commands that perform the following actions. The files to be edited will be found in /home/max/berezin/Data. Your command file look something like:

where sout has the problem number appended to it for each problem.

Each problem should include one or more comment lines explaining what you are doing.

You can change the contents of the data variable if you wish to run the sed commands on your own data set. But set it back before handing in the assignment. You can test your answer using my data set by using diff to compare the sout files generated with my copies to be found in /home/max/berezin/Data/Ans. Create an alias sdiffit that is defined as follows.

Unless stated otherwise, when a string is described as a word, treat a word as alpha, numeric, and underscore.

Each password entry uses a : as a field delimiter.

If you spot differences between your solutions and mine, please inform me and I'll check the datasets, my solution, and the interpretation of the problem.

Write a complete sed statement that edits the specified file. Different problems use different data files. Use the appropriate variable to reference the appropriate file.

sout.1 substitute the first occurrence of the word csh on a line with sh Use the passwd file.

sout.2 substitute the second occurrence of the string *NP* on a line with the string null field Use the passwd file.

sout.3 substitute all occurrences of the character : on a line with ; Use the passwd file.

sout.4 swap the last two fields on a line. Fields are separated by : Use the passwd file.

sout.5 remove *NP* from the second field. Colons must stay. Use the passwd file.

sout.6 remove the contents of the second field. Colons must stay. Use the passwd file.

sout.7 If the line starts with a t9 then substitute usr2 with usr3. Use the passwd file.

sout.8 replace all occurances /home/mp with /usr/ux. Use the passwd file.

sout.9 remove any periods. Use the password file.

sout.10 indent (one tab) all lines that do not begin with t9 Use passwd file.

sout.11 (2) The syntax of the 5th field (4th field from end) in the passwd file is last_name first_name [middle name or initial]. Some entries have three words, some two, and some have multiple words and use comas. Use a single edit statement and remove the right most name or word from that field. Describe the string of interest from the end of the line. Use the : delimiter to parse fields and a space or coma to recognize the right most word of field. Use the passwd file.

sout.12 (23) Write a sequence of sed commands that takes the file /home/max/berezin/Data/sed.data and edits out the html markings after attempting to reformat the document to approximate the html formating.

This will require several invokations of sed to complete the edit sessioin. You can use -e to specify several different edit actions to perform. However, you will not be able to perform all of the edit actions in a single edit session, so redirect the output into a tempory file and use that file as the input to the next edit sequence.

The data document is a real web page. However, the lines have been merged so that only the html markers influence the formating. If you view this with a text editor, it will be hard to read.

Official things to do:

Replace <p> marker with a blank line. Also remove any </p> end of paragraph markers. Keep in mind that </p> is not required and may not be matched to all <p>, so just remove.

Break line at <br> provided it is not already at end of line.

Replace the <li> marker with a asterick and space and double space the line being tagged.

Replace <b> and </b> with quotes, place space before open and a space after close quote.

Break any lines longer than 79 characters into a set of lines around 60 charaters long. Note the line may have to be reprocessed several times. See below on hints.

Indent all lines between the <ul> and </ul> markers and remove them. Also place a blank line at the <ul> and </ul> markers.

Replace all occurances of &lt; with < and &gt; with >

Hints:

Remove all spaces before the html tag start < and after the tag close >.

For the multicharacter line:

Test for a line of at least 79 characters
While line at least 79 characters long

Look at the N, P, and D to print and loop back to test. Not all three may be needed.

If done right, even a line of several 100 characters can be broken into lines of 60 to 80 characters long.

Remember:

N - joins the next line in the data file to the current line, leaving a new line between them.

P - prints (outputs) the contents of the current edit buffer up to and including the first new line character.

D - deletes the contents of the current edit buffer up to and including the first new line character. Then branches to the top of the list of edit commands without reading a new line of data.