Parallel processing can be accomplished in a variety of ways. This ranges from cpu pipelining to multiple systems tied together by a communication network.

In pipeling we saw a cpu that fetched a sequence of instructions and based on the premise that indentical instructions are not encountered sequentially in a program, it attempted to put different parts of itself to use running these in parallel when possible. Some of the more advanced systems are capable starting down both branches of a conditional branch instruction while the decission is still being made.

At the other extreme, there are a variety of projects that allow users on the internet to vollenteer their equipment when it is not busy to work on a major task. In this environment, each pc is independent but is running software that allows it to communicate with a master system to receive tasks and return their results.

On a "single box" system, parallelism can be implemented in a variety of ways. Two major catagories are symeterical and asymetrical parallelism.

Asymetrical

In asymetrical parallelism, different control circuits perform different but complimentary tasks. An example of this is a personal computer with a math co-processor or a video capture/display card. Both of these devices have their own ISA machine languages. It is often possible to issue a set of instructions to these co-processors and then allow the cpu to work on some other task while they perform their tasks. Additional circuitry is needed for communication and arbitration.

Another example is the I/O control on a mainframe computer. On the mainframe, each cpu has several I/O channels. Each of these channels can be connected to an I/O channel device that is essentially a programmable computer in and of itself and that is programmed to access and control i/o devices.

Many pipelined cpus use a technique called superscaler that parallels the bottlenecked portions of the cpu such as the ALU. However, on most of these, the duplicate subsystem is not identical, one may deal only with integer math and the other with floating point math.

Symetrical

In symetrical parallelism, the cpus or processing element are paralleled and indentical. This parallelism may or may not include other parts of the system such as the the memory and I/O access.

When analysing how a parallel architecture machine works it useful to look at how it works problems. The level at which the parallelism occurs is called the granularity.

On one extreme is a system like a Unix system that may have a large number of users performing a variety of task. However, most of these tasks have nothing to do with each other. The parallelism needed is the ability to run a partially completed task on what ever cpu is available without the the task being aware of any changes. This is a very course grained.

Somewhere in the middle is a form of parallelism where a task can be broken into a sub-set of related tasks that need to be completed to arrive at a final solution. An example is system that anayses the consequence of the next move of a chess game. Each possibility of the next step creates a cascade of subsequent actions/reactions. While each possibility is independent, the goal of winning must be explored down each possible.

At the other extreme is a system that is attempting to solve a single step in a problem by combining the resources of the system. An example would be a system with 10 16-bit arithmetic logic units that can be used in parallel to process 160 bit integer values. This is a fine grained system.

The grain of the system refers to the algorithms and software used on the sytem. The hardware analog is the coupling. The coupling is how the various parallel parts communicate with each other. Symetrical systems can be classified as tightly coupled or loosely coupled or somewhere inbetween.

Loosely coupled systems have a small number of large independent cpus with low speed interconnections. Communication is often done using a variation of i/o access and interrupts. Loosely coupled systems are most effective at running course grained tasks.

Tightly coupled systems often have a large number of small, close elements with high throughput interconnections. Communication features are often incorporated into the ISA commands and via shared memory. Tightly coupled systesm are most effective at running fine grained tasks.

Most systems offer some sort of intermediate between these extremes.

Implementation

Fork = fork is a system call that duplicates parts of the current environment with a new process id.