Because of its genesis in the research community, the Unix system has a large collection of text scanning and editing utilities. Users employ these to write manuals, manipulate data files, format text and documents for various output systems (line printer, commercial offset printers, etc), batch edit large text files or collection of files, and other chores.
To make the use of different utilities simpler to learn, a system of describing the nature of text in a file has been developed called the regular expression pattern matching. Regular expressions are a set of meta-characters used to build a description of a string of text without having to state the exact composition of that string. An example would be to describe a specific word that could be all lower case or capitalized. To describe the word dog, we would use the regular expression \<[dD]og\>. The backslash < and > represent the beginning and end of the word. The [dD] indicates upper or lower case d and the og are literals.
Another example would be to describe any word repeated twice in a row, but only if separated by a single space. What this word might be is unknown until you encounter it. You can intuitively identify such an occurrence with minimum effort. However, a computer must use an algorithm and concise set of rules that can consistently the pattern to the correct target.
The regular expression for this problem is :
\<\([a-zA-Z][a-zA-Z]*\) \1>
We will analyze this syntax in detail later.
The regular expression pattern matching rule set is used in a variety of text manipulation utilities on the Linux/Unix system.
Among these are :
Basic regular expressions and grep