Tuesday, 10 June 2014

OPERATING SYSTEM - IV

4.1 MEMORY MANAGEMENT
Memory is central to the operation of a modern computer system. Memory is a large array of words or bytes, each with its own address.
A program resides on a disk as a binary executable file. The program must be brought into memory and placed within a process for it to be executed Depending on
the memory management in use the process may be moved between disk and memory during its execution. The collection of processes on the disk that are waiting to be brought into memory for execution forms the input queue.  i.e. selected  one  of  the  process  in  the input  queue and  to  load  that  process  into memory.  We can provide protection by using two registers, usually a base and a limit. The base register holds the smallest legal physical memory address; the limit register specifies the size of the range. For example, if the base register holds 300040 and the limit register is 120900, then the program can legally access all addresses from 300040 through 420939(inclusive).

The binding of instructions and data to memory addresses can be done at any step along the way:
Compile time: If it is known at compile time where the process will reside in memory, then absolute code can be generated.
Load time: If it is not known at compile time where the process will reside in memory, then the compiler must generate re-locatable code.
Execution time: If the process can be moved during its execution from one memory segment to another, then binding must be delayed until run time.
4.1.1 Dynamic Loading
Better memory-space utilization can be done by dynamic loading. With dynamic loading, a routine is not loaded until it is called. All routines are kept on disk in a re-locatable load format. The main program is loaded into memory and is executed.
The advantage of dynamic loading is that an unused routine is never loaded.
4.1.2 Dynamic Linking
Most operating systems support only static linking, in which system language libraries are treated like any other object module and are combined by the loader into the binary program image. The concept of dynamic linking is similar to that of dynamic loading. Rather than loading being postponed until execution time,
linking is  postponed.  This feature is usually used with system libraries, such as language subroutine libraries. With dynamic linking, a stub is included in the image for each library-routine reference. This stub is a small piece of code that indicates how to locate the appropriate memory-resident library routing.
The entire program and data of a process must be in physical memory for the process to execute. The size of a process is limited to the size of physical memory. So  that  a process  can  be larger  than  the  amount  of memory  allocated  to  it,  a technique  called  overlays  is  sometimes  used.  The idea of overlays is to keep in
memory only those instructions and data that are needed at any given time. When other instructions are needed, they are loaded into space that was occupied previously by instructions that are no longer needed.
Example, consider a two-pass assembler. During pass 1, it constructs a symbol table; then, during pass 2, it generates machine-language code. We may be able to partition such an assembler into pass 1 code, pass 2 codes, and the symbol table 1, and common support routines used by both pass 1 and pass 2. Let us consider
Pass1                                    70K
Pass 2                                   80K
Symbol table                         20K
Common routines                           30K  
To load everything at once, we would require 200K of memory. If only 150K is available, we cannot run our process. But pass 1 and pass 2 do not need to be in memory at the same time. We thus  fine two  overlays:  Overlay  A  is  the symbol  table, common  routines, and  pass  1, and  overlay  B is  the symbol  table, common routines, and pass 2. We add an overlay driver (10K) and start with overlay A in memory. When we  finish  pass  1, we jump  to  the overlay  driver, which  reads  overlay  B into memory,  overwriting  overlay  A,  and  then  transfers  control  to  pass  2. Overlay needs only 120K; whereas overlay B needs 130K As in dynamic loading, overlays do not require any special support from the operating system.

4.1.3 Logical versus Physical Address Space
An  address  generated  by  the CPU  is  commonly  referred  to  as  a logical address, whereas an address seen by the memory unit is commonly referred to as a physical address.
The compile-time and load-time address-binding schemes result in an environment where the logical and physical addresses are the same. The execution-time address-binding scheme results in an environment where the logical and physical addresses differ. In this case, we usually refer to the logical address as a virtual address. The set of all logical addresses generated by a program is referred to  as  a logical  address  space;  the set  of all  physical  addresses  corresponding  to these logical addresses is referred to as a physical address space.
The run-time mapping from virtual to physical addresses is done by the memory management unit (MMU), which is a hardware device.
The base register is called a relocation register. The value in the relocation register is added to every address generated by a user process at the time it is sent to memory.  For example, if the base is at 13000, then an attempt by the user to address location 0 dynamically relocated to location 14,000; an access to location
347 is mapped to location 13347. The MS-DOS operating system running on the Intel 80x 86 families of processors uses four relocation registers when loading and running processes.
The user program never sees the real physical addresses. The program can create a pointer to location 347 store it memory, manipulate it, compare it to other addresses all as the number 347.
The user program deals with logical addresses. The memory-mapping hardware converts logical addresses into physical addressed Logical addresses (in the range 0 to max) and physical addresses (in the range R + 0 to R + max for a base value R). The user generates only logical addresses.
The concept of a logical address space that is bound to a separate physical address space is central to proper memory management.
4.2 SWAPPING
A process, can be swapped temporarily out of memory to a backing store, and then brought back into memory for continued execution. Assume a multiprogramming environment with a round robin CPU-scheduling algorithm.
When a quantum expires, the memory manager will start to swap out the process that  just  finished, and  to  swap  in  another  process  to  the memory  space that  has been freed . When each process finishes its quantum, it will be swapped with another process.

      A variant of this swapping policy is used for priority-based scheduling algorithms.  If a higher-priority process arrives and wants service, the memory manager can swap out the lower-priority process so that it can load and execute the higher priority process. When the higher priority process finishes, the lower-priority process can be swapped back in and continued. This variant of swapping is sometimes called rollout, roll in. A process is swapped out will be swapped back into the same memory space that it occupies previously. If binding is done at assembly or load time, then the process cannot be moved to different location. If
Execution-time binding is being used, and then it is possible to swap a process into a different memory space.
Swapping requires a backing store. The backing store is commonly a fast disk. It is large enough to accommodate copies of all memory images for all users. The system maintains a ready queue consisting of all processes whose memory images are on the backing store or in memory and are ready to run.
The context-switch time in such a swapping system is fairly high. Let us assume that the user process is of size 100K and the backing store is a standard hard disk with transfer rate of 1 megabyte per second. The actual transfer of the 100K process to or from memory takes
100K / 1000K per second = 1/10 second
 = 100 milliseconds
4.3 CONTIGUOUS ALLOCATION
The main memory must accommodate both the operating system and the various user processes. The memory is usually divided into two partitions, one for the resident operating system, and one for the user processes.
To place the operating system in low memory. Thus, we shall discuss only me situation where the operating system resides in low memory (Figure 8.5). The development of the other situation is similar. Common Operating System is placed in low memory.

4.3.1 Single-Partition Allocation
If the operating system is residing in low memory, and the user processes are executing in high memory. And operating-system code and data are protected from changes by the user processes. We also need protect the user processes from one another. We can provide this 2 protection by using relocation registers.
The relocation  register contains the value  of  the  smallest  physical  address; the limit register contains the range of logical addresses (for example, relocation = 100,040  and  limit  =  74,600). With  relocation  and  limit  registers, each  logical address  must  be less  than  the limit  register;  the MMU  maps  the logical  address
Dynamically by adding the value in the relocation register. This mapped address is sent to memory.
The relocation-register scheme provides an effective way to allow the operating system size to change dynamically.
4.3.2 Multiple-Partition Allocation
One of the simplest schemes for memory allocation is to divide memory into a number of fixed-sized partitions. Each partition may contain exactly one process.
Thus, the degree of multiprogramming is bound by the number of partitions. When a partition is free, a process is selected from the input queue and is loaded into the free partition. When the process terminates, the partition becomes available for another process.
The operating  system  keeps  a  table indicating  which  parts  of memory  are available and  which  are occupied. Initially, all memory is available for user processes, and is considered as one large block, of available memory, a hole. When a process arrives and needs memory, we search for a whole large enough for this process.
For example, assume that we have 2560K of memory available and a resident operating system of 400K. This situation leaves 2160K for user processes.
FCFS job scheduling, we can immediately allocate memory to processes P1, P2, P3. Holes size 260K that cannot be used by any of the remaining processes in the input queue. Using a round-robin CPU-scheduling with a quantum of 1 time unit, process will terminate at time 14, releasing its memory.
Memory allocation is done using Round-Robin Sequence as shown in fig.
When a process arrives and needs memory, we search this set for a hole that is large enough for this process. If the hole is too large, it is split into two: One part is allocated to the arriving process; the other is returned to the set of holes. When a process terminates, it releases its block of memory, which is then placed back in the set of holes. If the new hole is adjacent to other holes, we merge these adjacent holes to form one larger hole.
This procedure is a particular instance of the general dynamic storage-allocation problem, which is how to satisfy a request of size n from a list of free holes.  There are many solutions to this problem. The set of holes is searched to determine which hole is best to allocate, first-fit, best-fit, and worst-fit are the most common strategies used to select a free hole from the set of available holes.
First-fit: Allocate the first hole that is big enough. Searching can start either at the beginning of the set of holes or where the previous first-fit search ended.  We can  stop  searching  as  soon  as  we find  a free hole  that  is  large enough.
Best-fit: Allocate the smallest hole that is big enough. We must search the entire list, unless the list is kept ordered by size. This strategy-produces the smallest leftover hole.
Worst-fit:  Allocate the largest hole. Again, we must search the entire list unless it is sorted by size. This  strategy  produces  the  largest  leftover hole which  may  be more useful  than  the smaller leftover  hole from  a best-t approach.
4.3.3 External and Internal Fragmentation
As processes are loaded and removed from memory, the free memory space is broken into little pieces. External  fragmentation  exists  when  enough  to  the memory  space exists  to  satisfy  a  request, but  it  is  not  contiguous;  storage is fragmented into a large number of small holes.
Depending on the total amount of memory storage and the average process size, external fragmentation may be either a minor or a major problem.
Given N allocated blocks, another 0.5N blocks will be lost due to fragmentation.  That is, one-third of memory may be unusable. This property is known as the 50- percent rule.
Internal fragmentation - memory that is internal to partition, but is not being used.
4.4 PAGING
External fragmentation is avoided by using paging. In this physical memory is broken into blocks of the same size called pages. When  a process  is  to  be executed,  its  pages  are loaded  into  any  available memory  frames. Every address generated by the CPU is divided into any two parts: a page number (p) and a page
Offset (d) (Fig 7.3). The page number is used as an index into a page table. The page table contains the base address of each page in physical memory. This base address  is  combined  with  the  gage  offset  to  define the physical  memory  address that is sent to the memory unit.

 The page size like is defined by the hardware. The size of a page is typically a power of 2 varying between 512 bytes and 8192 bytes per page, depending on the computer architecture. The selection of a power of 2 as a page size makes the translation of a logical address into a page number and page offset. lf the size of  logical address space is 2m, and a page size is 2n addressing units (bytes or words),then the high-order m - n bits of a logical address designate the page number, and the  n  low-order bits  designate the  page  offset. Thus, the logical address is as follows:
page number
page                                                           offset
p                                                      d
m-n                                                  n

Where p is an index into the page table and d is the displacement within the
Paging is a form of dynamic relocation. Every logical address is bound by the paging hardware to some physical address.
When we use a paging scheme, we have no external fragmentation: Any free frame can be allocated to a process that needs it.
If process size is independent of page size, we can have internal fragmentation to average one-half page per process.
When a process arrives in the system to be executed, its size, expressed in pages, is examined. Each page of the process needs one frame. Thus, if the process requires n pages, there must be at least n frames available in memory. If there are n frames available, they are allocated to this arriving process. The first page of the process is loaded into one of the allocated frames and the frame number is put in the page table for this process. The next page is loaded into another frame, and its frame number is put into the page table, and so on.
The user  program  views  that  memory  as  one single contiguous  space, containing  only  this  one program. But the user program is scattered throughout physical memory and logical addresses are translated into physical addresses.
The operating system is managing physical memory, it must be aware of the allocation details of physical  memory:  which  frames  are allocated, which  frames are available, how  many  total  frames  there are, and  so  on. This information is generally kept in a data structure called a frame table. The frame table has one entry for each physical page frame, indicating whether the latter is free allocated and, if it is allocated, to which page of which process or processes.
The operating system maintains a copy of the page table for each process. Paging therefore increases the context-switch time.

4.5 SEGMENTATION
A user program can be subdivided using segmentation, in which the program and its associated data are divided into a number of segments. It  is not  required that  all  segments  of  all  programs  be of  the same length, although  there  is  a maximum  segment  length. As with paging, a logical address using segmentation
Consists of two parts, in this case a segment number and an offset.
Because of the use of unequal-size segments, segmentation is similar to dynamic partitioning. In segmentation, a program may occupy more than one partition, and these partitions need not be contiguous. Segmentation eliminates internal fragmentation but, like dynamic partitioning, it suffers from external fragmentation. However, because a process is broken up into a number of smaller pieces, the external fragmentation should be less. Whereas paging is invisible to the programmer, segmentation usually visible and is provided as a convenience for organizing programs and data.
Another consequence of unequal-size segments  is  that  there  is  no  simple relationship  between  logical  addresses  and  physical  addresses. Segmentation scheme would make use of a segment table for each process and a list of free blocks of main memory. Each segment table entry would have to as in paging give the starting address in main memory of the corresponding segment.  The entry should also provide the length of the segment, to assure that invalid addresses are not used. When a process enters the Running state, the address of its segment table is loaded into a special register used by the memory management hardware.
Consider an address of n + m bits, where the leftmost n bits are the segment number and the rightmost m bits are the offset. The following steps are needed for address translation:
Extract the segment number as the leftmost n bits of the logical address.
Use the segment number as an index into the process segment table to find the starting physical address of the segment.
Compare the offset, expressed in the rightmost m bits, to the length of the segment. If the offset is greater than or equal to the length, the address is invalid.
The desired physical address is the sum of the starting physical address of the segment plus the offset.
Segmentation and paging can be combined to have a good result.
4.6 VIRTUAL MEMORY
Virtual memory is a technique that allows the execution of process that may not be completely in memory. The main visible advantage of this scheme is that programs can be larger than physical memory.
Virtual  memory  is  the separation  of user logical  memory  from  physical
memory  this  separation  allows  an  extremely  large virtual  memory  to  be
provided for programmers when only a smaller physical memory is available
Following  are the situations, when  entire  program  is  not  required  to  load
fully.
1.User written error handling routines are used only when an error occurs in
the data or computation.
2.Certain options and features of a program may be used rarely.
3.Many tables are assigned a fixed amount of address space even though only
a small amount of the table is actually used.
The ability  to  execute a  program  that  is  only  partially  in  memory  would
counter many benefits.
                  1.Less number of I/O would be needed to load or swap each user program into memory.
                  2.A  program  would  no  longer be constrained  by  the amount  of physical memory that is available.
                  3.Each user program could take less physical memory, more programs could be run the same time, with a corresponding increase in CPU utilization and
throughput.


Virtual memory is commonly implemented by demand paging. It can also be
implemented in a segmentation system. Demand segmentation can also be used to
provide virtual memory.

4.7 DEMAND PAGING
A  demand  paging  is  similar to  a paging  system  with  swapping.
When  we want  to execute a process, we swap  it  into  memory. Rather than
swapping the entire process into memory.
When a process is to be swapped in, the pager guesses which pages will be
used  before  the process  is  swapped  out  again  Instead  of swapping  in  a  whole
process, the pager brings only those necessary pages into memory. Thus, it avoids
reading into memory pages that will not be used in anyway, decreasing the swap
time and the amount of physical memory needed.
Hardware support is required to distinguish between those pages that are in
memory  and  those  pages  that  are  on  the disk  using  the valid-invalid  bit  scheme.
Where valid and invalid pages can be checked checking the bit and marking a page will  have no  effect  if the process  never attempts  to  access  the pages. While  the process executes and accesses pages that are memory resident, execution proceeds normally.

Access  to  a page marked  invalid  causes  a  page-fault  trap. This  trap  is  the
result of the operating system's failure to bring the desired page into memory. But
page fault can be handled as following :

 Steps in handling a page fault

1.We check  an  internal  table for this  process  to  determine whether  the
reference was a valid or invalid memory access.
2.  If the reference was invalid, we terminate the process. If .it was valid, but we
have not yet brought in that page, we now page in the latter.
3. We find a free frame.
4. We schedule  a disk  operation  to  read  the desired  page into  the newly
5. When the disk read is complete, we modify the internal table kept with the
process and the page table to indicate that the page is now in memory.
6. We restart the instruction that was interrupted by the illegal address trap. The
process can now access the page as though it had always been memory.
Therefore, the operating system reads the desired page into memory and restarts the process as though the page had always been in memory.
The page replacement is used to make the frame free if they are not in used. If
no frame is free then other process is called in.
4.7.1 Advantages of Demand Paging:
1. Large virtual memory.
2. More efficient use of memory.
3. Unconstrained multiprogramming. There is no limit on degree of
Multiprogramming.
4.7.2 Disadvantages of Demand Paging:
1.Number of tables  and  amount  of processor over head  for handling  page
interrupts  are  greater than  in  the case of the simple paged  management
techniques.
2.due to the lack of an explicit constraints on a jobs address space size.
4.8 PAGE REPLACEMENT ALGORITHM
There are many  different page replacement  algorithms. We evaluate an
algorithm by running it on a particular string of memory reference and computing
the number of page faults. The string  of  memory  references  is  called  reference
string. Reference strings are generated artificially or by tracing a given system and
recording the address of each memory reference. The latter choice produces a large
number of data.

1. For a given page size we need to consider only the page number, not the
entire address.
2. if we have a reference to a page p, then any immediately following references
to page p will never cause a page fault. Page p will be in memory after the
first reference; the immediately following references will not fault.
Eg:- consider the address sequence
0100, 0432, 0101, 0612, 0102, 0103, 0104, 0101, 0611, 0102, 0103,
0104, 0101, 0610, 0102, 0103, 0104, 0104, 0101, 0609, 0102, 0105
and reduce to 1, 4, 1, 6,1, 6, 1, 6, 1, 6, 1
To determine the number of page faults for a particular reference string and
page  replacement  algorithm, we also  need  to  know  the number of page frames
available. As the number of frames available increase, the number of page faults
will decrease.
4.8.1 FIFO Algorithm
The simplest  page-replacement  algorithm  is  a FIFO  algorithm. A  FIFO
replacement  algorithm  associates  with  each  page  the  time when  that  page was
brought into  memory.  When  a  page  must be replaced, the oldest  page  is chosen.
We can create a FIFO queue to hold all pages in memory.
The first  three references  (7, 0, 1) cause page faults, and  are brought  into
these empty eg. 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1 and consider 3 frames.
This  replacement  means  that  the  next  reference  to  0  will  fault. Page 1  is  then replaced by page 0.
4.8.2 Optimal Algorithm
An optimal page-replacement algorithm has the lowest page-fault rate of all
algorithms. An  optimal  page-replacement  algorithm  exists, and  has  been  called
OPT or MIN. It is simply Replace the page that will not be used for the longest period of time.

Now consider the same string with 3 empty frames.
The reference to  page 2  replaces  page 7,  because 7  will  not  be used  until
reference 18, whereas page 0 will be used at 5, and page 1 at 14. The reference to
page 3 replaces page 1, as page 1 will be the last of the three pages in memory to
be referenced again. Optimal replacement is much better than a FIFO.
The optimal page-replacement algorithm is difficult to implement, because it
requires future knowledge of the reference string.
4.8.3 LRU Algorithm
The FIFO algorithm uses the time when a page was brought into memory;
the OPT algorithm uses the time when a page is to be used. In LRU replace the
page that has not been used for the longest period of time.
LRU replacement associates with each page the time of that page's last use.
When a page must be replaced, LRU chooses that page that has not been used for the longest period of time.
Let SR be the reverse of a reference string S, then the page-fault rate for the
OPT algorithm on S is the same as the page-fault rate for the OPT algorithm on SR.
4.8.4 LRU Approximation Algorithms
Some systems  provide no  hardware support, and  other page-replacement
algorithm. Many systems provide some help, however, in the form of a reference
bit. The reference bit  for a  page  is  set,  by  the hardware, whenever that  page  is
referenced. Reference bits are associated with each entry in the page table Initially,
all bits are cleared (to 0) by the operating system. As a user process executes, the
bit associated with each page referenced is set (to 1) by the hardware.
4.8.5 Additional-Reference-Bits Algorithm
The operating  system  shifts  the  reference bit  for  each  page into  the high-
order or of its 8-bit byte, shifting the other bits right 1 bit, discarding the low-order
bit. These 8-bit shift registers contain the history of page use for the last eight
time  periods. If the shift register contains 00000000, then the page has not been
used  for eight  time periods;  a  page that  is  used  at  least  once  each period  would
have a shift register value of 11111111.

4.8.6Second-Chance Algorithm
The basic  algorithm  of second-chance replacement  is  a FIFO  replacement
algorithm. When a page gets a second chance, its reference bit is cleared and its
arrival e is reset to the current time. 
4.8.7 Enhanced Second-Chance Algorithm
The second-chance  algorithm  described  above can  be enhanced  by
considering troth the reference bit and the modify bit as an ordered pair.
1. (0,0) neither recently used nor modified best page to replace.
2. (0,1) not recently used but modified not quite as good, because the page
will need to be written out before replacement.
3. (1,0) recently used but clean probably will be used again soon.
4. (1,1) recently used and modified probably will be used again, and write
out will be needed before replacing it
4.9 COUNTING ALGORITHMS
There are many other algorithms that can be used for page replacement.
LFU  Algorithm:
The least  frequently  used  (LFU) page-replacement  algorithm requires that the page with the smallest count be replaced. This algorithm suffers from  the situation  in  which  a page is  used  heavily  during  the initial  phase of a process, but then is never used again.
MFU Algorithm:
The most frequently used (MFU) page-replacement algorithm
is based on the argument that the page with the smallest count was probably just
brought in and has yet to be used.
This  procedure allows  the process  to  restart  as  soon  as  possible, without  waiting for the victim page to be written out. When the victim is later written out, its frame is added to the free-frame pool.
When  the FIFO  replacement  algorithm  mistakenly  replaces  a page mistakenly replaces a page that is still in active use, that page is quickly retrieved from the free-frame buffer, and no I/O is necessary. The free-frame buffer provides protection against the relatively poor, but simple, FIFO replacement algorithm.

4.10 PRINCIPLES OF I/O HARDWARE

Different people look at I/O hardware in different ways. Electrical engineers look at it in terms of chips, wires, power supplies, motors, and all the other physical components that make up the hardware. Programmers look at the interface presented to the softwarethe commands the hardware accepts, the functions it carries out, and the errors that can be reported back. In this book we are concerned with programming I/O devices, not designing, building, or maintaining them, so our interest will be restricted to how the hardware is programmed, not how it works inside. Nevertheless, the programming of many I/O devices is often intimately connected with their internal operation. In the next three subsections we will provide a little general background on I/O hardware as it relates to programming.

4.10.1 I/O Devices

I/O devices can be roughly divided into two categories: block devices and character devices. A block device is one that stores information in fixed-size blocks, each one with its own address. Common block sizes range from 512 bytes to 32,768 bytes. The essential property of a block device is that it is possible to read or write each block independently of all the other ones. Disks are the most common block devices.
If you look closely, the boundary between devices that are block addressable and those that are not is not well defined. Everyone agrees that a disk is a block addressable device because no matter where the arm currently is, it is always possible to seek to another cylinder and then wait for the required block to rotate under the head. Now consider a tape drive used for making disk backups. Tapes contain a sequence of blocks. If the tape drive is given a command to read block N, it can always rewind the tape and go forward until it comes to block N. This operation is analogous to a disk doing a seek, except that it takes much longer. Also, it may or may not be possible to rewrite one block in the middle of a tape. Even if it were possible to use tapes as random access block devices, that is stretching the point somewhat: they are not normally used that way.
The other type of I/O device is the character device. A character device delivers or accepts a stream of characters, without regard to any block structure. It is not addressable and does not have any seek operation. Printers, network interfaces, mice (for pointing), rats (for psychology lab experiments), and most other devices that are not disk-like can be seen as character devices.
This classification scheme is not perfect. Some devices just do not fit in. Clocks, for example, are not block addressable. Nor do they generate or accept character streams. All they do is cause interrupts at well-defined intervals. Still, the model of block and character devices is general enough that it can be used as a basis for making some of the operating system software dealing with I/O device independent. The file system, for example, deals only with abstract block devices and leaves the device-dependent part to lower-level software called device drivers.
I/O devices cover a huge range in speeds, which puts considerable pressure on the software to perform well over many orders of magnitude in data rates. Fig. 3-1 shows the data rates of some common devices. Most of these devices tend to get faster as time goes on.
Figure 3-1. Some typical device, network, and bus data rates.
Device
Data rate
Keyboard
10 bytes/sec
Mouse
100 bytes/sec
56K modem
7 KB/sec
Scanner
400 KB/sec
Digital camcorder
4 MB/sec
52x CD-ROM
8 MB/sec
FireWire (IEEE 1394)
50 MB/sec
USB 2.0
60 MB/sec
XGA Monitor
60 MB/sec
SONET OC-12 network
78 MB/sec
Gigabit Ethernet
125 MB/sec
Serial ATA disk
200 MB/sec
SCSI Ultrawide 4 disk
320 MB/sec
PCI bus
528 MB/sec

I/O units typically consist of a mechanical component and an electronic component. It is often possible to separate the two portions to provide a more modular and general design. The electronic component is called the device controller or adapter. On personal computers, it often takes the form of a printed circuit card that can be inserted into an expansion slot. The mechanical component is the device itself. 

He controller card usually has a connector on it, into which a cable leading to the device itself can be plugged. Many controllers can handle two, four, or even eight identical devices. If the interface between the controller and device is a standard interface, either an official ANSI, IEEE, or ISO standard or a de facto one, then companies can make controllers or devices that fit that interface. Many companies, for example, make disk drives that match the IDE (Integrated Drive Electronics) and SCSI (Small Computer System Interface) interfaces.nearly always deals with the controller, not the device. Most personal computers and servers use the bus model of  for communication between the CPU and the controllers. Large mainframes often use a different model, with specialized I/O computers called I/O channels taking some of the load off the main CPU.
The interface between the controller and the device is often low-level. A disk, for example, might be formatted with 1024 sectors of 512 bytes per track. What actually comes off the drive, however, is a serial bit stream, starting with a preamble, then the 4096 bits in a sector, and finally a checksum, also called an Error-Correcting Code (ECC). The preamble is written when the disk is formatted and contains the cylinder and sector number, the sector size, and similar data.
The controller's job is to convert the serial bit stream into a block of bytes and perform any error correction necessary. The block of bytes is typically first assembled, bit by bit, in a buffer inside the controller. After its checksum has been verified and the block declared to be free of errors, it can then be copied to main memory.
The controller for a monitor also works as a bit serial device at an equally low level. It reads bytes containing the characters to be displayed from memory and generates the signals used to modulate the CRT beam. The controller also generates the signals for making a CRT beam do a horizontal retrace after it has finished a scan line, as well as the signals for making it do a vertical retrace after the entire screen has been scanned. On an LCD screen these signals select individual pixels and control their brightness, simulating the effect of the electron beam in a CRT. If it were not for the video controller, the operating system programmer would have to program the scanning explicitly. With the controller, the operating system initializes the controller with a few parameters, such as the number of characters or pixels per line and number of lines per screen, and lets the controller take care of actually driving the display.
Controllers for some devices, especially disks, are becoming extremely sophisticated. For example, modern disk controllers often have many megabytes of memory inside the controller. As a result, when a read is being processed, as soon as the arm gets to the correct cylinder, the controller begins reading and storing data, even if it has not yet reached the sector it needs. This cached data may come in handy for satisfying subsequent requests. Furthermore, even after the requested data has been obtained, the controller may continue to cache data from subsequent sectors, since they are likely to be needed later. In this manner, many disk reads can be handled without any disk activity at all.

4.10.2 Memory-Mapped I/O

Each controller has a few registers that are used for communicating with the CPU. By writing into these registers, the operating system can command the device to deliver data, accept data, switch itself on or off, or otherwise perform some action. By reading from these registers, the operating system can learn what the device's state is, whether it is prepared to accept a new command, and so on.
In addition to the control registers, many devices have a data buffer that the operating system can read and write. For example, a common way for computers to display pixels on the screen is to have a video RAM, which is basically just a data buffer, available for programs or the operating system to write into.
The issue thus arises of how the CPU communicates with the control registers and the device data buffers. Two alternatives exist. In the first approach, each control register is assigned an I/O port number, an 8- or 16-bit integer. Using a special I/O instruction such as
He CPU can read in control register PORT and store the result in CPU register REG. Similarly, using the CPU can write the contents of REG to a control register. Most early computers, including nearly all mainframes, such as the IBM 360 and all of its successors, worked this way.

4.10.3 Interrupts

Usually, controller registers have one or more status bits that can be tested to determine if an output operation is complete or if new data is available from an input device. A CPU can execute a loop, testing a status bit each time until a device is ready to accept or provide new data. This is called polling or busy waiting. In the realm of I/O, where you might have to wait a very long time for the outside world to accept or produce data, polling is not acceptable except for very small dedicated systems not running multiple processes.
In addition to status bits, many controllers use interrupts to tell the CPU when they are ready to have their registers read or written. In the context of I/O, all you need to know is that most interface devices provide an output which is logically the same as the "operation complete" or "data ready" status bit of a register, but which is meant to be used to drive one of the IRQ (Interrupt ReQuest) lines of the system bus. Thus when an interrupt-enabled operation completes, it interrupts the CPU and starts the interrupt handler running. This piece of code informs the operating system that I/O is complete. The operating system may then check the status bits to verify that all went well, and either harvest the resulting data or initiate a retry.

The number of inputs to the interrupt controller may be limited; Pentium-class PCs have only 15 available for I/O devices. Some controllers are hard-wired onto the system parent board, for example, the disk and keyboard controllers of an IBM PC. On older systems, the IRQ used by the device was set by a switch or jumper associated with the controller. If a user bought a new plug-in board, he had to manually set the IRQ to avoid conflicts with existing IRQs. Few users could do this correctly, which led the industry to develop Plug 'n Play, in which the BIOS can automatically assign IRQs to devices at boot time to avoid conflicts.

No comments:

Post a Comment