grep Assignment
CSCI 330 Unix

Due 29 July 2007 11 PM.

72 points

For the following problems, create a file that contains the complete grep and regular expression command sequence that solve the following problems. Each question will indicate the source of its data.

There are instructions below on setting variables containing the path to some of the target files to save typing.

Put your command sequences in a shell script file that runs the commands under the bash shell. Redirect the output to the file name specified. To test your solution, compare your output to a set of output files in my directory. Your shell script file should start with :
#!/bin/bash

Make the file executable and invoke it to run your grep statements

To simplify the grep statement, set up the following variables. Put the path and filename of the data file into a variable. The statement above will force the grep command to run under the bash shell and the proper syntax for setting variables you will need is:

Linux may use Unicode for its character set. This may affect how upper/lower case characters and ranges are handled. For this assignment, we want to use the ASCII rules. To set character interpretation to ASCII, add the line :

to your script before the grep commands. Each time the script is run, it will temporarily set this shell variable, but only for the duration of the script.

For each problem, write a comment line describing what the problem does. You may add additional comment lines. A comment line must start with a #. On a separate line, write the grep statement in the form. You will docked 1/2 the question's points for missing or bad documentation.

where

Remember

To check your answers, run your shell script and it will either return an error and/or generate one or more gout files. Use diff to compare your output file and my output file. My output will be in /home/lx/berezin/ans. The easiest way to perform the comparison is to write a function called gdiffit. To define gdiffit, enter the following in a text file (name of your choice) :

  #!/bin/bash
  #call me ishmael

  gdiffit () {
    diff gout.$1 /home/lx/berezin/data/gout.$1
  }

To define the function, use the source command :

To test your output, simply run gdiffit with the problem number :

For each of the following problems, write a complete grep statement that is runnable. Do not use grep options to solve problem unless specifically told to do so. Do not use egrep.

Keep in mind the general structure of the passwd file is :

z912730:x:10009:108:JOHN BEREZINSKI:/home/lx/z912730:/bin/bash

A pared down copy of the passwd file has been copied to the Data directory. Search the passwd file in /home/lx/berezin/Data for all lines :

gout.01 that contain the password record for the user z912730.

gout.02 that contain the password records for all t90 ids, the faculty ids.

gout.03 that end with with the word false. Some programs require a user id to work but don't actually ever log in and don't need a default shell. For these the admin uses /bin/false as the login shell. So look at the end of line.

gout.04 for student users whose login ids are even. A student's login id consists of an initial z character and a series of digits.

gout.05 for users whose login ids are even. Remember the 1st field ends with a colon.

gout.06 for all user login ids that are 2 characters long. Remember the user id is the 1st field of the line. The delimiter after the user id is the colon (:).

gout.07 - Question removed. See question 12.

gout.08 where the home directory do not match the login id. Look for lines that also list the login id as the home directory. The login id appears at the beginning of the line and if listed as a directory will be immediately preceded by a forward slash (/). Use the grouping and back-reference. Use the inverse option.

gout.09 where there are no whitespace (either tabs or spaces) on the line.

gout.10 where the gecos field has at least a three part name in it. The gecos field is the 5th field in and usually contains the user's real name. It is also the only field with one or more spaces in it. So, search for lines where one or more spaces occur followed by one or more non-spaces followed by a space.

gout.11 where the gecos field has a two part name in it, but no more. Search for lines where one or more consecutive spaces occur followed by one or more non-spaces. Remember each field is separated by colons. You may need to include them in the regular expression to get the right match.

gout.12 where the user-id is even. The user id is the third field in. Because the zid and the group ids may also be even, you will have to use the beginning of the line as an initial delimiter. The number you are looking for occurs after two colons have been parsed.

****

A listing of the /var/log directory has been stored in the file varlog in the Data directory. /var/log is where many programs store their log files. Log files are used to monitor and trouble shoot programs such as the web server or the login daemons.

Search the varlog file in /home/lx/berezin/Data for all lines :

gout.13 that contain the string log at the end of the line. The newline must immediately follow the g. The string may be part of a larger word.

gout.14 containing the word log at the end of the line. The newline must immediately follow the g.

gout.15 containing the string .log at the end of the line. The newline must immediately follow the g. The period must precede the string log. You must specifically search for the period.

gout.16 that do not contain a date in their name. Match line that do not contain the the digit sequence 2006 or 2007 but use the inverse option to eliminate the matches. This is a single grep.

gout.17 that contain the the digit sequence representing the date 17 July 2007. It will be in the form yyyymmdd or yyyy-mm-dd, so there may be zero or 1 hyphens between the year and month and month and day. This is a fairly unique character sequence, so you don't need to specify anchors. But do use the braces to set the number of hyphens at each position they may exist.

gout.18 containing the string log that is part of a larger word. Use two greps. Search for the string log and pipe it to a grep that looks for lines that don't contain the word log (use the inverse on the second grep).

The file numdata contains mostly lines of numbers in various configurations and quantities.

Search the file /home/lx/berezin/Data/numdata for lines that :

gout.19 contains only integer numbers, each consisting of one or more digits. Lines may be padded with space or tab in front or after number(s). There may be more than one number on the line.

gout.20 contains only one number. Number is unsigned, however it may have white space padding may exist before and/or after.

gout.21 contains only one integer number, however this number may be negatively signed (single - sign or no sign). White space padding may exist before and/or after.

gout.22 contains only one number, however this number may be positively or negatively signed (single sign or no sign). White space padding may exist before and/or after.

gout.23 contains only one unsigned number. It may be a floating point number, so it may include a period in it.

Use a two grep command sequence. Start by looking at the beginning of the line for zero or more digits followed by zero or 1 period followed by 1 digit followed by zero or one period followed by zero or more digits to the end of the line.

Pipe this to a second grep with the invert options and exclude any lines in which two (or more) periods occur. The periods may be separated by one or more digits. Look for numbers that don't have white space padding before and/or after. (Don't look for white space.