This presents just the regular expression. 

Keep in mind for our testing, we are using a list of dictionary words and 
password filtering words. In some cases, the entries may have non-word
characters embedded, which makes them compound entries.

Suggestion, write a shell script like the assignment and copy these in and
run them. Look at the output.


Word - any combination of alpha-numeric and underscore characters.
* For our dataset, don't worry about underscore.

Line - any combination of any character between beginning and end of line.
* may be alpha-numeric, punctuation, or spacing.

Alpha character - a-z or A-Z.

Alpha-numerc - a-z,A-Z, or 0-9.

Character - any character, including spaces and tabs, or punctuation.



Find any words with duplicate letters next to each other.

  \([a-zA-Z\)\1

  # identify and remember an alpha character, look for match immediately after.

Find any words with two consecutive letters repeated anywhere in the line

  \([a-zA-Z][a-zA-Z]\).*\1

  # remember a pair of consecutive alpha characters, then scan the rest of 
  #   the line for a match on both. .* represents zero or more of any other 
  #   characters.

Find any words with two consecutive letters repeated anywhere else in the line.

  \([a-zA-Z]\)\([a-zA-Z]\)\([^\1]\|\1[^\2].*\1\2

  # remember two characters next to each other, followed by a character that
  #   is not the 1st character or if it is then the next character is not the
  #   second character. Once that has been confirmed, look for a match, 
  #   skipping any non-matches until match is found or end of line.
  #   
  # The current regular expression library allows parentheses to be used 
  #   for both rememebering matched patterns and for specifying an or 
  #   condition, (|), but when used as a regular expression, extensive use
  #   of backslash needed.   

Find any words with two consecutive letters repeated anywhere else in the line
in reverse order.

  \([a-zA-Z]\)\([a-zA-Z]\)\([^\2]\|\2[^\1].*\2\1

  # Like previous example, except matches swapped.

Find any words with two consecutive letters repeated anywhere in the line
in reverse order.

  \([a-zA-Z]\)\([a-zA-Z]\).*\2\1

  # Simpler than previous example because a swapped pair immediately 
  #   the pair being matched is acceptable.

Find any words with letter pair ck or tch in them.

  \(ck\|tch\)

  # Example of the OR feature of the regular expressions library.
  # Note, it is often possible to use the extended version of the regular 
  #   expression libary by using command options. 
  #   However, some of the other regular expression features may not be
  #   available.

  grep -E '(ck|tch)' $data1
 
Find any lines longer than 8 characters.

   .\{9\}

  # Look for any occurance of 9 or more characters of any value on a line.

Find any words longer than 8 characters.

   [a-zA-Z0-9]\{9\}
   [[:alnum:]]\{9\}
   \<[a-zA-Z0-9]\{9,\}\>

  # Following the definition of a word, the specific word character must
  #   occur consecutively. Because the acceptable characters are specificly
  #   defined, and a longer word is an acceptable match, the beginning and 
  #   end of word markers are not needed, and in fact, would limit the 
  #   possible matches unless the 3rd form shown was used.

Find any words with exactly 8 characters.

   \<[a-zA-Z0-9]\{8\}\>

  # This does limit the match to an 8 charater word.

Find any words with 1st letter capitalized.

   \<[A-Z]

  # Note that it is possible that the word may be a single character.

Find any word with the 2nd letter capitalized and possibly others.

   \<[a-z0-9][A-Z]

  # Find beginning of line and look for a non-capitalized 1st word 
  #   character, followed by the 2nd character being capitalized.
  #   example pH   
   
Find any word with two capital characters but separated by at lease
one non-capital word character.

   [A-Z][a-z0-9]\+[A-Z]

  # Search the line until a capital is found, followed by a non-capital word 
  #   character followed by a capital.

Find any line with two capitals but separated by at least one not capital. 

   [A-Z][^A-Z]\+[A-Z]
  # Find a capital letter, followed by one or more not capital characters,
  #   these could be lower case letters, punctuation, spacing.

Find any word ending with a d
   d\>

Find any 'line' ending with a s.
   s$

Find any word ending with an s but the line ending with something else.

   s\>.*[^s]$

  # The \> guarentees that the a word ending with s is found, followed by
  #   any number of additional characters as long as the last character on
  #   the line is not an s. 

Find any words containing an apostrophe followed exactly 3 characters.

    '[[:alnum:]].\{3\}\>

  # Find the apostrophe, followed by 3 word 
  # If you are doing this on a command line, take care with quoting.

Find any words with at least one non-alpha character other than ' (apostrophe).

    [^[:alnum:]']

  # Find a character that is not a word character of apostrophe. The not
  #   modifiers applies to all characters listed in []