This presents just the regular expression. 

Keep in mind for our testing, we are using a list of dictionary words and 
password filtering words. In some cases, the entries may have non-word
characters embedded, which makes them compound entries.

Suggestion, write a shell script like the assignment and copy these in and
run them. Look at the output.


Word - any combination of alpha-numeric and underscore characters.
* For our dataset, don't worry about underscore.

Line - any combination of any character between beginning and end of line.
* may be alpha-numeric, punctuation, or spacing.

Alpha character - a-z or A-Z.

Alpha-numerc - a-z,A-Z, or 0-9.

Character - any character, including spaces and tabs, or punctuation.


Find any words with duplicate letters next to each other. \([a-zA-Z\)\1 # identify and remember an alpha character, look for match immediately after. Find any words with two consecutive letters repeated anywhere in the line \([a-zA-Z][a-zA-Z]\).*\1 # remember a pair of consecutive alpha characters, then scan the rest of # the line for a match on both. .* represents zero or more of any other # characters. Find any words with two consecutive letters repeated anywhere else in the line. \([a-zA-Z]\)\([a-zA-Z]\)\([^\1]\|\1[^\2].*\1\2 # remember two characters next to each other, followed by a character that # is not the 1st character or if it is then the next character is not the # second character. Once that has been confirmed, look for a match, # skipping any non-matches until match is found or end of line. # # The current regular expression library allows parentheses to be used # for both rememebering matched patterns and for specifying an or # condition, (|), but when used as a regular expression, extensive use # of backslash needed. Find any words with two consecutive letters repeated anywhere else in the line in reverse order. \([a-zA-Z]\)\([a-zA-Z]\)\([^\2]\|\2[^\1].*\2\1 # Like previous example, except matches swapped. Find any words with two consecutive letters repeated anywhere in the line in reverse order. \([a-zA-Z]\)\([a-zA-Z]\).*\2\1 # Simpler than previous example because a swapped pair immediately # the pair being matched is acceptable. Find any words with letter pair ck or tch in them. \(ck\|tch\) # Example of the OR feature of the regular expressions library. # Note, it is often possible to use the extended version of the regular # expression libary by using command options. # However, some of the other regular expression features may not be # available. grep -E '(ck|tch)' $data1 Find any lines longer than 8 characters. .\{9\} # Look for any occurance of 9 or more characters of any value on a line. Find any words longer than 8 characters. [a-zA-Z0-9]\{9\} [[:alnum:]]\{9\} \<[a-zA-Z0-9]\{9,\}\> # Following the definition of a word, the specific word character must # occur consecutively. Because the acceptable characters are specificly # defined, and a longer word is an acceptable match, the beginning and # end of word markers are not needed, and in fact, would limit the # possible matches unless the 3rd form shown was used. Find any words with exactly 8 characters. \<[a-zA-Z0-9]\{8\}\> # This does limit the match to an 8 charater word. Find any words with 1st letter capitalized. \<[A-Z] # Note that it is possible that the word may be a single character. Find any word with the 2nd letter capitalized and possibly others. \<[a-z0-9][A-Z] # Find beginning of line and look for a non-capitalized 1st word # character, followed by the 2nd character being capitalized. # example pH Find any word with two capital characters but separated by at lease one non-capital word character. [A-Z][a-z0-9]\+[A-Z] # Search the line until a capital is found, followed by a non-capital word # character followed by a capital. Find any line with two capitals but separated by at least one not capital. [A-Z][^A-Z]\+[A-Z] # Find a capital letter, followed by one or more not capital characters, # these could be lower case letters, punctuation, spacing. Find any word ending with a d d\> Find any 'line' ending with a s. s$ Find any word ending with an s but the line ending with something else. s\>.*[^s]$ # The \> guarentees that the a word ending with s is found, followed by # any number of additional characters as long as the last character on # the line is not an s. Find any words containing an apostrophe followed exactly 3 characters. '[[:alnum:]].\{3\}\> # Find the apostrophe, followed by 3 word # If you are doing this on a command line, take care with quoting. Find any words with at least one non-alpha character other than ' (apostrophe). [^[:alnum:]'] # Find a character that is not a word character of apostrophe. The not # modifiers applies to all characters listed in []