Suppose we use this command at the LINUX command line:
prompt: Program1 > Program1.out
What does it do?
The effect of this is to redirect the standard output of Program1 into a file called Program1.out.
The file may be new or it may already exist. If it already exists, it will be over-written.
Suppose we used this command instead:
prompt: Program1 >> Program1.out
In this case, the output is appended to the end of the file, rather than replacing the contents of the file.
Suppose we used this command instead:
prompt: Program1 2> Program1.out
In this case, what is redirected is output written to stderr, not stdout. Likewise
prompt: Program1 2>> Program1.out
will redirect output written to stderr, appending it to the file.
Note: Redirection of this sort is dependent on which command shell we are using. If the above method does not work, try using
prompt: Program1 >& Program1.out
to redirect output written to either stdout or stderr into the file instead, or
prompt: Program >>& Program1.out
to redirect output written to either stdout or stderr, appending it to the nfile instead.
How does it work?
The operating system will create the file if necessary and then modify a particular entry in the file descriptor table. One of the pieces of information in each entry of the file descriptor table is a pointer to something in the LINUX file system (which includes files, pipes, sockets, etc.). The entry for standard output (number 1) or for standard error (number 2) will be changed so its pointer leads to "Program1.out".
Related command
It may be worthwhile to look into the "tee" command as well.
Suppose we use this command at the LINUX command line:
prompt: Program1 < Datafile.txt
What does it do?
The effect of this is to redirect the standard input of Program1 to come from a file called Datafile.txt.
How does it work?
The operating system will create the file if necessary and then modify a particular entry in the file descriptor table. One of the pieces of information in each entry of the file descriptor table is a pointer to something in the LINUX file system (which includes files, pipes, sockets, etc.). The entry for standard output (number 0) will be changed so its pointer leads to "Datafile.txt".
Suppose we use the following command at the LINUX command line:
prompt: Program1 | Program2
This is a "command-line pipe". Let's explore what is going on here.
What does it do?
It is worth noticing that Program1 may be doing all of its processing (such as sorting the contents of a file) before it writes out all of its output at one time.
If these are programs that normally are interacting with a user, there can be disconcerting effects. Program2 may be printing messages which prompt its user to provide input, but the user cannot do so because all of Program2's standard input comes from Program1. If Program1 prints messages which prompt its user to provide input, the user will never see them, as those messages will be fed to Program2. Will Program2 know what to do when it finds these messages in its input?
It is therefore more appropriate to use the | pipe with programs that do not print such messages; they simply read from standard input, process it and write to standard output. Such programs are sometimes called "filters".
How does it work?
The system creates two processes and a buffer. The buffer is of some specific default size such as 64 KB. Data is added to the buffer at one end and removed from the other end, essentially a queue. The system is using the same mechanism as in the pipe() system call. The use of pipe() always involves a buffer.
The operating system is doing something like this:
(Instead of execlp(), this may be one of the members of the exec() family of functions.)
At this point, the output from Program1 (in P1) is going into the pipe and the input for Program2 (in P2) is coming out of the pipe.
Fake pipe
An alternative to a command-line pipe is to use a file: We interpret
prompt: Program1 | Program2
as three commands:
prompt: Program1 > tempfile.txt prompt: Program2 < tempfile.txt prompt: rm tempfile.txt
This is called a "fake pipe". It is actually how the MS-DOS operating system implemented the command-line pipe. It's easy enough to understand, but it has the disadvantage that as only one program is running at a time, the overall execution may be slower.
On the other hand, the operating system does not have to do anything very complex to make it work.
The temporary file ("tempfile.txt" above) is almost certainly a fixed-size buffer in memory, in which case we have to worry about overfilling the buffer. If instead it is a disk file, then we do not have that worry, but the name of the file needs to be unique (based on the time or date, etc.) so we do not accidentally overwrite an existing (valuable) file.
Comment and Speculation
This is an example of a Producer-Consumer situation. There is a danger of Program1 trying to write into the buffer even though it is full at present, and there is a danger of Program2 trying to read from the buffer even though it is empty at present.
If we were writing the code to do this for ourselves, one way to do it would be to have an integer N counting bytes in the buffer and a semaphore S controlling access to the counter. Suppose P1 wants to write a byte to the pipe. It will want to increment the counter. So:
wait(S); if (N < BUFFERSIZE) { write one byte increment N } post(S);
Likewise, if P2 wants to read a byte from the buffer, it will want to decrement the counter. So:
wait(S); if (N > 0) { read one byte decrement N } post(S);
We are using the semaphore S to ensure that only one process has access to the counter N (and thus to the buffer) at a time.
However, do Program1 and Program2 include such code? No, so who is managing this?
The system manages the buffer and there is presumably some additional structure of this sort.