Types of Hard drive Interface

General

In the PC environment, the responsibility of specifying the track, sector, and head were still left up to the system's ROM BIOS. This information is sent to the interface card which communicates it to the drive.

The original BIOS program used 4 bits to specify the head, 6 its for the sector, and 10 bits for the cylinders(track) giving a max of 16 heads, 63 sectors, and 1024 cylinders. This gives 1,032,192 sectors or 528 MB.

Because of the legacy of the Intel PC architecture, this configuration has not been allowed to change.

MFM & RTM

Early systems depended on the system support software and controller card to handle much of this.

IDE

In order to get around this, the IDE architecture moved most of the on to the drive itself and the interface card was custom designed for each system bus architecture as a pass through device. This allows the OS to be independent of how the actual drive functions. The on-board controllers could be designed to lie and would translate the chs data recognized by the BIOS to information that was true to the disk drive. However, this technique could cause problems with an OS that used its own library calls rather than the standard BIOS calls.

IDE architecture is designed to allow the control circuits on a single drive to also control a second "slave" drive. A simple set of jumpers allows this configuration to be implemented.

By creating a separate single standard for the hard drive and simplifying the interface card to be just that, it became very practical to build large numbers of these disks cheaply. And as the architecture of the personal computer advanced, only the interface card had to be replaced.

Although, IDE is designed primarily for the PC(Intel) architecture, because of its cost and the independence of the drive from the interface card, it can be used in Macs and has been adopted as a cheap alternative to SCSI in small end workstations and other computers.

EIDE/ATA-2

Because IDE was designed with the IBM/Intel architecture in mind, it did conform to the chs limits of the BIOS. To get around this, enhanced versions of the IDE design have been introduced recently.

EIDE used a second addressing scheme called LBA (Logical Block Address) which just numbers the sector sequentially from 0 to 2^24-1 and left it up to the controller to re-map the logical id to chs address. From 504MB(IDE) up to 2.1GB or beyond.

The 2.1 GB is another limit created on the system side and more specifically the Microsoft DOS definitions. The original DOS FAT (File Access Table) could only count to 2.1 GB. The newer FAT32 allows the system to recognize larger disks.

The controller interface card itself allows you to plug to master disks in thus giving a total of 4 drives.

The protocols allow devices other than hard drives to be attached and functional, such as CDROMs and tape drives.

Recognition that the system bus architecture is probably no longer 16 bit ISA and therefore faster data transfers are available.

SCSI

SCSI - small computer system interface. The SCSI interface is an alternative controller/peripheral sub-system. SCSI offers faster access. It also is isolated from the main data bus and allows devices attached to it to interact with out interference from or with the CPU. The 0 bus allowed up to 7 (or 14) devices to be attached in sequence to the controller.

SCSI can have a higher data transfer than IDE, because it is both an interface and a defined bus architecture. In some versions , its bus designed runs at a faster speed than the standard PC bus. It also can have a wider data path than the standard PC bus.

Also because of its bus design, SCSI allows external devices to be easily attached and be fair distance, up to several feet, away.

Raid

Another aspect of disk storage is RAID - Redundant Array of Inexpensive (Independent) Disks. RAID is an architectural feature that does not directly deal with the design of the disk but rather how to group a quantity of these disks together to get one or more of features found on expensive high end storage devices.

The Raid controller makes several disks appear to be a single disk and depending on how they are arranged, this can provide more storage, faster access to storage and/or better protection against disk failure and data loss.

Raid is available in 6 levels 0-5 with each level offering various combinations of these three features.

Raid Level 0

- consists of dividing the storage up into strips consisting of one to many sectors grouped together with each succeeding strip appearing on the next drive of the Raid device in a round robin fashion. This is called striping. When a strip is composed of more than one sector, the more common practice, all of the sectors must occur on the same track of the drive.

Advantages

If a single large block of data made of several strips distributed across multiple drives is requested, some systems are designed to read the multiple drives concurrently. Additionally, it is possible to read multiple files concurrently. This parallel access giving extremely high data throughput. This applies to both reads and writes.

Disadvantages

It works very poorly if a single sector or small number of sectors are desired because the whole strip is retrieved anyway.

If a disk has a mean failure of 20,000 hours, increasing the number of disks increases the chance that one of the will fail in less than 20,000 hours and since the data is spread across several disks, the failure of one disk equates to a complete failure of the Raid system.

Raid Level 1

- consists of duplicate disks (only 2) with identical data on both. This is called mirroring. Striping occurs naturally. Since, the same data is on both disks, the system can read strip 1 from the 1st drive and strip 2 from the second drive concurrently. Note: the example in the book is actually called 01 or 0+1 (raid 0 with mirroring).

Advantages

The dual disks provides excellent reliability and quick read if provided with advanced controllers. Each disk can handle a separate read concurrently.

A failure of one disk has no effect on the availability of data only on the speed of access. This system is sometime designed with hot swap capabilities meaning the failed disk can be replaced without shutting down the machine.

Disadvantages

Writing gains no advantage because both copies of the disk must be updated. It may even take a little longer because if one of the disk sets is handling a previous request, the system must wait for that to finish before indicating that the read is successful.

There are two disks and they serve the function of one. The cost of disk storage doubles to gain the security of redundancy.

Raid Level 2

- discards striping and goes the opposite way. The data is broken up into very small units - word, byte or even bit and each piece is assigned to a different drive. Error checking data such as Hamming code is generated and saved on additional disk. The spindles (disks) of all disk units (data and Hamming bits) are synchronized so that all bits can be accessed at the same time. The error code allows the data to be regenerated even if one of the drives fails.

Advantages

High speed reads are supported with this design. Hamming codes can be very efficiently calculated electronically with little time delay.

If a large number of data units are used, the hamming code offers a more efficient (less additional drives space) error recovery system than dual disks.

Disadvantages

The down side of this scheme is that the use of error correcting data becomes cost effective only with a large number of disks. A 32 bit byte with 6 bits of Hamming data and a word parity bit requires 39 drives and the error data is still 19% of storage. Additionally the controller must be able to perform the Hamming code check for every bit transfered.

Even though the protocol allows very small units of access, the standard hardware protocols use sectors size access. If a level 2 Raid system uses a 10 disk design, a 10 byte file would take up 1 sector on each of the 10 drives.

Because the bits for each byte of a file are spread across several devices and includes error checking data:

Parallel or overlapping reads of different files are not possible.

Spindles of all drive have to be synchronized.

RAID Level 2 is seldom implemented. The other RAID levels offer similar protection at lower cost.

Raid Level 3

- Distributes the data at the byte level across the drives. The Hamming bits are replaced by a single parity bit. If an error is detected, the read is re-issued at time cost, but only if error is detected.

Advantages

By using a singe parity bit rather than Hamming code only one additional disk is needed.

If there is a catastrophic failure of one of the disks, the parity bit is sufficient to recover the data. Normally, a parity bit only determines that something is wrong and not where. However, it is easy to physically (electrically) determine if a particular drive fails and therefore the location of the missing data is known and its value can be derived from the associated data and parity bits.

Disadvantages

Because data is distributed at the byte level, only one file may be access at a time. Although the throughput for a single file is high, the number of separate i/o requests is not much better than with a single drive.

Synchronized spindles are required to allow optimal error checking time.

Raid Level 4

- implements access and the parity scheme at the sector or strip level. Level 4 works like level 0 (interleaving) with the addition of a form of parity stored on an additional drive. Each sector on the drive set is exclusive Or-ed together and the resulting strip is written to the additional drive. The sector on a lost drive can be regenerated from the remaining drives. Buffering is used to hold the sectors being accessed thus avoiding the need for synchronization.

Advantages

By working at the sector or strip level, overlapping reads can occur.

Disadvantages

The inefficiency of this design comes with small changes to data. Not only does the data have to be written out, but all the associated sectors must be read in and a new parity calculated and written out. An alternative is to just read in the old data before it is changed and the parity data and adjust it accordingly. But that still takes four additional disk accesses.

Additionally, the parity drive is constantly being accessed even if not all of the drives in a strip set are not accessed.

Raid Level 5

- uses the concept of Level 4 but distributes the parity strip across all of the drives in the same round robin fashion as the data.

Advantages

Fast read access.

The demand on all disks is about equal.

A system with as few as 3 disks is practical.

Disadvantages

Because the parity bit is not on one specific disk, the system has to be able to quickly determine which strip is the parity strip and which is the data.