4.1 MEMORY MANAGEMENT
Memory is central to the operation of a
modern computer system. Memory is a large array of words or bytes, each with
its own address.
A program resides on a disk as a binary
executable file. The program must be brought into memory and placed within a process
for it to be executed Depending on
the memory management in use the process may be moved between disk
and memory during its execution. The collection of processes on the disk that
are waiting to be brought into memory for execution forms the input queue. i.e. selected
one of the process
in the input queue and
to load that
process into memory. We can provide protection by using two
registers, usually a base and a limit. The base register holds
the smallest legal physical memory address; the limit register specifies the
size of the range. For example, if the base register holds 300040 and the limit
register is 120900, then the program can legally access all addresses from
300040 through 420939(inclusive).
The binding of instructions and data to memory addresses can be
done at any step along the way:
Compile time: If it is
known at compile time where the process will reside in memory, then absolute
code can be generated.
Load time: If it is
not known at compile time where the process will reside in memory, then the
compiler must generate re-locatable code.
Execution time: If the
process can be moved during its execution from one memory segment to another,
then binding must be delayed until run time.
4.1.1 Dynamic Loading
Better memory-space utilization can be
done by dynamic loading. With dynamic loading, a routine is not loaded until it
is called. All routines are kept on disk in a re-locatable load format. The main
program is loaded into memory and is executed.
The advantage of dynamic loading is that an unused routine is
never loaded.
4.1.2 Dynamic Linking
Most operating systems support only
static linking, in which system language libraries are treated like any other
object module and are combined by the loader into the binary program image. The
concept of dynamic linking is similar to that of dynamic loading. Rather than
loading being postponed until execution time,
linking is postponed. This feature is usually used with system
libraries, such as language subroutine libraries. With dynamic linking, a stub
is included in the image for each library-routine reference. This stub is a small
piece of code that indicates how to locate the appropriate memory-resident
library routing.
The entire program and data of a process
must be in physical memory for the process to execute. The size of a process is
limited to the size of physical memory. So
that a process can be
larger than the
amount of memory allocated
to it, a technique
called overlays is
sometimes used. The idea of overlays is to keep in
memory only those instructions and data that are needed at any
given time. When other instructions are needed, they are loaded into space that
was occupied previously by instructions that are no longer needed.
Example, consider a two-pass assembler. During
pass 1, it constructs a symbol table; then, during pass 2, it generates
machine-language code. We may be able to partition such an assembler into pass
1 code, pass 2 codes, and the symbol table 1, and common support routines used
by both pass 1 and pass 2. Let us consider
Pass1
70K
Pass 2 80K
Symbol table 20K
Common routines 30K
To load everything at once, we would
require 200K of memory. If only 150K is available, we cannot run our process.
But pass 1 and pass 2 do not need to be in memory at the same time. We
thus fine two overlays:
Overlay A is the
symbol table, common routines, and
pass 1, and overlay
B is the symbol table, common routines, and pass 2. We add an
overlay driver (10K) and start with overlay A in memory. When we finish
pass 1, we jump to the
overlay driver, which reads
overlay B into memory, overwriting
overlay A, and
then transfers control
to pass 2. Overlay needs only 120K; whereas overlay B
needs 130K As in dynamic loading, overlays do not require any special support
from the operating system.
4.1.3 Logical versus Physical Address Space
An
address generated by the
CPU is
commonly referred to
as a logical address, whereas an
address seen by the memory unit is commonly referred to as a physical address.
The compile-time and load-time address-binding
schemes result in an environment where the logical and physical addresses are
the same. The execution-time address-binding scheme results in an environment
where the logical and physical addresses differ. In this case, we usually refer
to the logical address as a virtual address. The set of all logical addresses
generated by a program is referred to
as a logical address
space; the set of all
physical addresses corresponding
to these logical addresses is referred to as a physical address space.
The run-time mapping from virtual to
physical addresses is done by the memory management unit (MMU), which is a
hardware device.
The base register is called a relocation
register. The value in the relocation register is added to every address
generated by a user process at the time it is sent to memory. For example, if the base is at 13000, then an
attempt by the user to address location 0 dynamically relocated to location
14,000; an access to location
347 is mapped to location 13347. The MS-DOS operating system
running on the Intel 80x 86 families of processors uses four relocation
registers when loading and running processes.
The user program never sees the real
physical addresses. The program can create a pointer to location 347 store it
memory, manipulate it, compare it to other addresses all as the number 347.
The user program deals with logical
addresses. The memory-mapping hardware converts logical addresses into physical
addressed Logical addresses (in the range 0 to max) and physical addresses (in
the range R + 0 to R + max for a base value R). The user generates only logical
addresses.
The concept of a logical address space
that is bound to a separate physical address space is central to proper memory
management.
4.2 SWAPPING
A process, can be swapped temporarily
out of memory to a backing store, and then brought back into memory for continued
execution. Assume a multiprogramming environment with a round robin CPU-scheduling
algorithm.
When a quantum expires, the memory
manager will start to swap out the process that
just finished, and to
swap in another
process to the memory
space that has been freed . When each process finishes its quantum, it will be swapped with another
process.
A variant of this swapping policy is
used for priority-based scheduling algorithms.
If a higher-priority process arrives and wants service, the memory
manager can swap out the lower-priority process so that it can load and execute
the higher priority process. When the higher priority process finishes, the lower-priority
process can be swapped back in and continued. This variant of swapping is
sometimes called rollout, roll in. A process is swapped out will be swapped
back into the same memory space that it occupies previously. If binding is done
at assembly or load time, then the process cannot be moved to different
location. If
Execution-time binding is being used, and
then it is possible to swap a process into a different memory space.
Swapping requires a backing store. The
backing store is commonly a fast disk. It is large enough to accommodate copies
of all memory images for all users. The system maintains a ready queue
consisting of all processes whose memory images are on the backing store or in
memory and are ready to run.
The context-switch time in such a
swapping system is fairly high. Let us assume that the user process is of size
100K and the backing store is a standard hard disk with transfer rate of 1
megabyte per second. The actual transfer of the 100K process to or from memory
takes
100K / 1000K per second = 1/10 second
=
100 milliseconds
4.3 CONTIGUOUS ALLOCATION
The main memory must accommodate both
the operating system and the various user processes. The memory is usually
divided into two partitions, one for the resident operating system, and one for
the user processes.
To place the operating system in low
memory. Thus, we shall discuss only me situation where the operating system
resides in low memory (Figure 8.5). The development of the other situation is
similar. Common Operating System is placed in low memory.
4.3.1 Single-Partition Allocation
If the operating system is residing in
low memory, and the user processes are executing in high memory. And
operating-system code and data are protected from changes by the user
processes. We also need protect the user processes from one another. We can
provide this 2 protection by using relocation registers.
The relocation register contains the value of
the smallest physical
address; the limit register contains the range of logical addresses (for
example, relocation = 100,040 and limit
= 74,600). With relocation
and limit registers, each logical address must
be less than the limit
register; the MMU maps
the logical address
Dynamically by adding the value in the
relocation register. This mapped address is sent to memory.
The relocation-register scheme provides
an effective way to allow the operating system size to change dynamically.
4.3.2 Multiple-Partition Allocation
One of the simplest schemes for memory
allocation is to divide memory into a number of fixed-sized partitions. Each
partition may contain exactly one process.
Thus, the degree of multiprogramming is bound by the number of
partitions. When a partition is free, a process is selected from the input
queue and is loaded into the free partition. When the process terminates, the
partition becomes available for another process.
The operating system
keeps a table indicating which
parts of memory are available and which
are occupied. Initially, all memory is available for user processes, and
is considered as one large block, of available memory, a hole. When a process
arrives and needs memory, we search for a whole large enough for this process.
For example, assume that we have 2560K
of memory available and a resident operating system of 400K. This situation
leaves 2160K for user processes.
FCFS job scheduling, we can immediately allocate memory to processes
P1, P2, P3. Holes size 260K that cannot be used by any of the remaining
processes in the input queue. Using a round-robin CPU-scheduling with a quantum
of 1 time unit, process will terminate at time 14, releasing its memory.
Memory allocation is done using Round-Robin
Sequence as shown in fig.
When a process arrives and needs memory, we search this set for a
hole that is large enough for this process. If the hole is too large, it is
split into two: One part is allocated to the arriving process; the other is
returned to the set of holes. When a process terminates, it releases its block
of memory, which is then placed back in the set of holes. If the new hole is
adjacent to other holes, we merge these adjacent holes to form one larger hole.
This procedure is a particular instance
of the general dynamic storage-allocation problem, which is how to satisfy a
request of size n from a list of free holes.
There are many solutions to this problem. The set of holes is searched
to determine which hole is best to allocate, first-fit, best-fit, and worst-fit
are the most common strategies used to select a free hole from the set of
available holes.
First-fit: Allocate the first hole that
is big enough. Searching can start either at the beginning of the set of holes
or where the previous first-fit search ended.
We can stop searching
as soon as we
find a free hole that
is large enough.
Best-fit: Allocate the smallest hole
that is big enough. We must search the entire list, unless the list is kept
ordered by size. This strategy-produces the smallest leftover hole.
Worst-fit: Allocate the largest hole. Again, we must
search the entire list unless it is sorted by size. This strategy
produces the largest
leftover hole which may be more useful than
the smaller leftover hole from a best-t approach.
4.3.3 External and Internal Fragmentation
As processes are loaded and removed from
memory, the free memory space is broken into little pieces. External fragmentation
exists when enough
to the memory space exists
to satisfy a
request, but it is
not contiguous; storage is fragmented into a large number of
small holes.
Depending on the total amount of memory
storage and the average process size, external fragmentation may be either a
minor or a major problem.
Given N allocated blocks, another 0.5N
blocks will be lost due to fragmentation.
That is, one-third of memory may be unusable. This property is known as
the 50- percent rule.
Internal fragmentation - memory that is
internal to partition, but is not being used.
4.4 PAGING
External fragmentation is avoided by
using paging. In this physical memory is broken into blocks of the same size
called pages. When a process is
to be executed, its
pages are loaded into
any available memory frames. Every address generated by the CPU is
divided into any two parts: a page number (p) and a page
Offset (d) (Fig 7.3). The page number is used as an index into a
page table. The page table contains the base address of each page in physical
memory. This base address is combined
with the gage
offset to define the physical memory
address that is sent to the memory unit.
The
page size like is defined by the hardware. The size of a page is typically a
power of 2 varying between 512 bytes and 8192 bytes per page, depending on the computer
architecture. The selection of a power of 2 as a page size makes the
translation of a logical address into a page number and page offset. lf the
size of logical address space is 2m, and a
page size is 2n addressing units
(bytes or words),then the high-order m - n bits of a logical address designate
the page number, and the n low-order bits designate the
page offset. Thus, the logical
address is as follows:
page number
page offset
p d
m-n n
Where p is an index into the page table
and d is the displacement within the
Paging is a form of dynamic relocation.
Every logical address is bound by the paging hardware to some physical address.
When we use a paging scheme, we have no
external fragmentation: Any free frame can be allocated to a process that needs
it.
If process size is independent of page
size, we can have internal fragmentation to average one-half page per process.
When a process arrives in the system to
be executed, its size, expressed in pages, is examined. Each page of the
process needs one frame. Thus, if the process requires n pages, there must be
at least n frames available in memory. If there are n frames available, they
are allocated to this arriving process. The first page of the process is loaded
into one of the allocated frames and the frame number is put in the page table
for this process. The next page is loaded into another frame, and its frame
number is put into the page table, and so on.
The user
program views that
memory as one single contiguous space, containing only
this one program. But the user
program is scattered throughout physical memory and logical addresses are
translated into physical addresses.
The operating system is managing
physical memory, it must be aware of the allocation details of physical memory:
which frames are allocated, which frames are available, how many
total frames there are, and so on.
This information is generally kept in a data structure called a frame table.
The frame table has one entry for each physical page frame, indicating whether
the latter is free allocated and, if it is allocated, to which page of which
process or processes.
The operating system maintains a copy of
the page table for each process. Paging therefore increases the context-switch
time.
4.5 SEGMENTATION
A user program can be subdivided using
segmentation, in which the program and its associated data are divided into a
number of segments. It is
not required that all
segments of all
programs be of the same length, although there
is a maximum segment
length. As with paging, a logical address using segmentation
Consists of two parts, in this case a
segment number and an offset.
Because of the use of unequal-size segments, segmentation is
similar to dynamic partitioning. In segmentation, a program may occupy more
than one partition, and these partitions need not be contiguous. Segmentation
eliminates internal fragmentation but, like dynamic partitioning, it suffers
from external fragmentation. However, because a process is broken up into a number
of smaller pieces, the external fragmentation should be less. Whereas paging is
invisible to the programmer, segmentation usually visible and is provided as a
convenience for organizing programs and data.
Another consequence of unequal-size
segments is that
there is no
simple relationship between logical
addresses and physical
addresses. Segmentation scheme would make use of a segment table for each
process and a list of free blocks of main memory. Each segment table entry
would have to as in paging give the starting address in main memory of the corresponding
segment. The entry should also provide
the length of the segment, to assure that invalid addresses are not used. When
a process enters the Running state, the address of its segment table is loaded
into a special register used by the memory management hardware.
Consider an address of n + m bits, where
the leftmost n bits are the segment number and the rightmost m bits are the
offset. The following steps are needed for address translation:
Extract the segment number as the
leftmost n bits of the logical address.
Use the segment number as an index into the process segment table
to find the starting physical address of the segment.
Compare the offset, expressed in the
rightmost m bits, to the length of the segment. If the offset is greater than
or equal to the length, the address is invalid.
The desired physical address is the sum
of the starting physical address of the segment plus the offset.
Segmentation and paging can be combined
to have a good result.
4.6 VIRTUAL MEMORY
Virtual memory is a technique that
allows the execution of process that may not be completely in memory. The main
visible advantage of this scheme is that programs can be larger than physical
memory.
Virtual
memory is the separation of user logical memory
from physical
memory this separation
allows an extremely
large virtual memory to be
provided for programmers when only a smaller physical memory is
available
Following are the
situations, when entire program
is not required
to load
fully.
1.User written error handling routines
are used only when an error occurs in
the data or computation.
2.Certain options and features of a
program may be used rarely.
3.Many tables are assigned a fixed
amount of address space even though only
a small amount of the table is actually used.
The ability to execute a
program that is
only partially in
memory would
counter many benefits.
1.Less
number of I/O would be needed to load or swap each user program into memory.
2.A program
would no longer be constrained by the
amount of physical memory that is
available.
3.Each
user program could take less physical memory, more programs could be run the
same time, with a corresponding increase in CPU utilization and
throughput.
Virtual memory is commonly implemented
by demand paging. It can also be
implemented in a segmentation system. Demand segmentation can also
be used to
provide virtual memory.
4.7 DEMAND PAGING
A demand paging
is similar to a paging
system with swapping.
When we want to execute a process, we swap it
into memory. Rather than
swapping the entire process into memory.
When a process is to be swapped in, the
pager guesses which pages will be
used before the process
is swapped out
again Instead of swapping
in a whole
process, the pager brings only those necessary pages into memory.
Thus, it avoids
reading into memory pages that will not be used in anyway,
decreasing the swap
time and the amount of physical memory needed.
Hardware support is required to
distinguish between those pages that are in
memory and those
pages that are
on the disk using
the valid-invalid bit scheme.
Where valid and invalid pages can be checked checking the bit and
marking a page will have no effect
if the process never
attempts to access
the pages. While the process executes
and accesses pages that are memory resident, execution proceeds normally.
Access to a page marked
invalid causes a
page-fault trap. This trap
is the
result of the operating system's failure to bring the desired page
into memory. But
page fault can be handled as following :
Steps in handling a page fault
1.We check an
internal table for this process
to determine whether the
reference was a valid or invalid memory access.
2. If the reference was
invalid, we terminate the process. If .it was valid, but we
have not yet brought in that page, we now page in the latter.
3. We find a free frame.
4. We schedule a disk operation
to read the desired
page into the newly
5. When the disk read is complete, we modify the internal table
kept with the
process and the page table to indicate that the page is now in
memory.
6. We restart the instruction that was interrupted by the illegal
address trap. The
process can now access the page as though it had always been
memory.
Therefore, the operating system reads
the desired page into memory and restarts the process as though the page had
always been in memory.
The page replacement is used to make the
frame free if they are not in used. If
no frame is free then other process is called in.
4.7.1 Advantages of Demand Paging:
1. Large virtual memory.
2. More efficient use of memory.
3. Unconstrained multiprogramming. There
is no limit on degree of
Multiprogramming.
4.7.2 Disadvantages of Demand Paging:
1.Number of tables and
amount of processor over
head for handling page
interrupts are greater than
in the case of the simple
paged management
techniques.
2.due to the lack of an explicit
constraints on a jobs address space size.
4.8 PAGE REPLACEMENT ALGORITHM
There are many different page replacement algorithms. We evaluate an
algorithm by running it on a particular string of memory reference
and computing
the number of page faults. The string of memory references
is called reference
string. Reference strings are generated artificially or by tracing
a given system and
recording the address of each memory reference. The latter choice
produces a large
number of data.
1. For a given page size we need to consider only the page number,
not the
entire address.
2. if we have a reference to a page p, then any immediately
following references
to page p will never cause a page fault. Page p will be in memory
after the
first reference; the immediately following references will not
fault.
Eg:- consider the address sequence
0100, 0432, 0101, 0612, 0102, 0103, 0104, 0101, 0611, 0102, 0103,
0104, 0101, 0610, 0102, 0103, 0104, 0104, 0101, 0609, 0102, 0105
and reduce to 1, 4, 1, 6,1, 6, 1, 6, 1, 6, 1
To determine the number of page faults for a particular reference
string and
page replacement algorithm, we also need
to know the number of page frames
available. As the number of frames available increase, the number
of page faults
will decrease.
4.8.1 FIFO Algorithm
The simplest page-replacement algorithm
is a FIFO algorithm. A
FIFO
replacement algorithm associates
with each page
the time when that
page was
brought into memory. When
a page must be replaced, the oldest page
is chosen.
We can create a FIFO queue to hold all pages in memory.
The first three references (7, 0, 1) cause page faults, and are brought
into
these empty eg. 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1
and consider 3 frames.
This
replacement means that
the next reference
to 0 will
fault. Page 1 is then replaced by page 0.
4.8.2 Optimal Algorithm
An optimal page-replacement algorithm has the lowest page-fault
rate of all
algorithms. An optimal page-replacement algorithm
exists, and has been
called
OPT or MIN. It is simply Replace the page that will not be used
for the longest period of time.
Now consider the same string with 3 empty frames.
The reference to page 2
replaces page 7, because 7
will not be used
until
reference 18, whereas page 0 will be used at 5, and page 1 at 14.
The reference to
page 3 replaces page 1, as page 1 will be the last of the three
pages in memory to
be referenced again. Optimal replacement is much better than a
FIFO.
The optimal page-replacement algorithm is difficult to implement,
because it
requires future knowledge of the reference string.
4.8.3 LRU Algorithm
The FIFO algorithm uses the time when a
page was brought into memory;
the OPT algorithm uses the time when a page is to be used. In LRU
replace the
page that has not been used for the longest period of time.
LRU replacement associates with each page the time of that page's
last use.
When a page must be replaced, LRU
chooses that page that has not been used for the longest period of time.
Let SR be the reverse of a reference string S, then
the page-fault rate for the
OPT algorithm on S is the same as the page-fault rate for the OPT
algorithm on SR.
4.8.4 LRU Approximation Algorithms
Some systems provide
no hardware support, and other page-replacement
algorithm. Many systems provide some help, however, in the form of
a reference
bit. The reference bit for
a page
is set, by the
hardware, whenever that page is
referenced. Reference bits are associated with each entry in the
page table Initially,
all bits are cleared (to 0) by the operating system. As a user
process executes, the
bit associated with each page referenced is set (to 1) by the
hardware.
4.8.5 Additional-Reference-Bits Algorithm
The operating system
shifts the reference bit
for each page into
the high-
order or of its 8-bit byte, shifting the other bits right 1 bit,
discarding the low-order
bit. These 8-bit shift registers contain the history of page use
for the last eight
time periods. If the shift
register contains 00000000, then the page has not been
used for eight time periods;
a page that is
used at least
once each period would
have a shift register value of 11111111.
4.8.6Second-Chance Algorithm
The basic algorithm of second-chance replacement is a
FIFO replacement
algorithm. When a page gets a second chance, its reference bit is
cleared and its
arrival e is reset to the current time.
4.8.7 Enhanced Second-Chance Algorithm
The second-chance
algorithm described above can
be enhanced by
considering troth the reference bit and the modify bit as an
ordered pair.
1. (0,0) neither recently used nor modified best page to replace.
2. (0,1) not recently used but modified not quite as good, because
the page
will need to be written out before replacement.
3. (1,0) recently used but clean probably will be used again soon.
4. (1,1) recently used and modified probably will be used again,
and write
out will be needed before replacing it
4.9 COUNTING ALGORITHMS
There are many other algorithms that can
be used for page replacement.
• LFU
Algorithm:
The least frequently
used (LFU) page-replacement algorithm requires that the page with the
smallest count be replaced. This algorithm suffers from the situation
in which a page is
used heavily during
the initial phase of a process,
but then is never used again.
• MFU Algorithm:
The most frequently used (MFU)
page-replacement algorithm
is based on the argument that the page with the smallest count was
probably just
brought in and has yet to be used.
This
procedure allows the process to
restart as soon
as possible, without waiting for the victim page to be written
out. When the victim is later written out, its frame is added to the free-frame
pool.
When
the FIFO replacement algorithm
mistakenly replaces a page mistakenly replaces a page that is
still in active use, that page is quickly retrieved from the free-frame buffer,
and no I/O is necessary. The free-frame buffer provides protection against the
relatively poor, but simple, FIFO replacement algorithm.
4.10
PRINCIPLES OF I/O HARDWARE
Different
people look at I/O hardware in different ways. Electrical engineers look at it
in terms of chips, wires, power supplies, motors, and all the other physical
components that make up the hardware. Programmers look at the interface
presented to the softwarethe commands the hardware accepts, the functions it
carries out, and the errors that can be reported back. In this book we are
concerned with programming I/O devices, not designing, building, or maintaining
them, so our interest will be restricted to how the hardware is programmed, not
how it works inside. Nevertheless, the programming of many I/O devices is often
intimately connected with their internal operation. In the next three
subsections we will provide a little general background on I/O hardware as it
relates to programming.
4.10.1 I/O Devices
I/O devices can be roughly
divided into two categories: block devices and
character devices. A block device is one that
stores information in fixed-size blocks, each one with its own address. Common
block sizes range from 512 bytes to 32,768 bytes. The essential property of a
block device is that it is possible to read or write each block independently
of all the other ones. Disks are the most common block devices.
If you look closely, the
boundary between devices that are block addressable and those that are not is
not well defined. Everyone agrees that a disk is a block addressable device
because no matter where the arm currently is, it is always possible to seek to
another cylinder and then wait for the required block to rotate under the head.
Now consider a tape drive used for making disk backups. Tapes contain a
sequence of blocks. If the tape drive is given a command to read block N, it can always rewind the tape and go forward until
it comes to block N. This operation is analogous
to a disk doing a seek, except that it takes much longer. Also, it may or may
not be possible to rewrite one block in the middle of a tape. Even if it were
possible to use tapes as random access block devices, that is stretching the
point somewhat: they are not normally used that way.
The other type of I/O
device is the character device. A character device delivers or accepts a stream
of characters, without regard to any block structure. It is not addressable and
does not have any seek operation. Printers, network interfaces, mice (for
pointing), rats (for psychology lab experiments), and most other devices that
are not disk-like can be seen as character devices.
This classification scheme
is not perfect. Some devices just do not fit in. Clocks, for example, are not
block addressable. Nor do they generate or accept character streams. All they
do is cause interrupts at well-defined intervals. Still, the model of block and
character devices is general enough that it can be used as a basis for making
some of the operating system software dealing with I/O device independent. The file system, for example, deals only with
abstract block devices and leaves the device-dependent part to lower-level
software called device drivers.
I/O devices cover a huge
range in speeds, which puts considerable pressure on the software to perform
well over many orders of magnitude in data rates. Fig.
3-1 shows the data rates of some common devices. Most of these devices tend
to get faster as time goes on.
I/O units typically
consist of a mechanical component and an electronic component. It is often
possible to separate the two portions to provide a more modular and general
design. The electronic component is called the device
controller or adapter. On personal
computers, it often takes the form of a printed circuit card that can be
inserted into an expansion slot. The mechanical component is the device itself.
He controller card usually has a connector on it, into which a cable leading to the device itself can be plugged. Many controllers can handle two, four, or even eight identical devices. If the interface between the controller and device is a standard interface, either an official ANSI, IEEE, or ISO standard or a de facto one, then companies can make controllers or devices that fit that interface. Many companies, for example, make disk drives that match the IDE (Integrated Drive Electronics) and SCSI (Small Computer System Interface) interfaces.nearly always deals with the controller, not the device. Most personal computers and servers use the bus model of for communication between the CPU and the controllers. Large mainframes often use a different model, with specialized I/O computers called I/O channels taking some of the load off the main CPU.
The interface between the
controller and the device is often low-level. A disk, for example, might be
formatted with 1024 sectors of 512 bytes per track. What actually comes off the
drive, however, is a serial bit stream, starting with a preamble, then the 4096 bits in a sector, and
finally a checksum, also called an Error-Correcting
Code (ECC). The preamble is written
when the disk is formatted and contains the cylinder and sector number, the
sector size, and similar data.
The controller's job is to
convert the serial bit stream into a block of bytes and perform any error
correction necessary. The block of bytes is typically first assembled, bit by
bit, in a buffer inside the controller. After its checksum has been verified
and the block declared to be free of errors, it can then be copied to main
memory.
The controller for a
monitor also works as a bit serial device at an equally low level. It reads
bytes containing the characters to be displayed from memory and generates the
signals used to modulate the CRT beam. The controller also generates the
signals for making a CRT beam do a horizontal retrace after it has finished a
scan line, as well as the signals for making it do a vertical retrace after the
entire screen has been scanned. On an LCD screen these signals select
individual pixels and control their brightness, simulating the effect of the
electron beam in a CRT. If it were not for the video controller, the operating
system programmer would have to program the scanning
explicitly. With the controller, the operating system initializes the
controller with a few parameters, such as the number of characters or pixels
per line and number of lines per screen, and lets the controller take care of
actually driving the display.
Controllers for some devices, especially disks, are
becoming extremely sophisticated. For example, modern disk controllers often
have many megabytes of memory inside the controller. As a result, when a read
is being processed, as soon as the arm gets to the correct cylinder, the
controller begins reading and storing data, even if it has not yet reached the
sector it needs. This cached data may come in handy for satisfying subsequent
requests. Furthermore, even after the requested data has been obtained, the
controller may continue to cache data from subsequent sectors, since they are
likely to be needed later. In this manner, many disk reads can be handled
without any disk activity at all.
4.10.2 Memory-Mapped I/O
Each controller has a few
registers that are used for communicating with the CPU. By writing into these
registers, the operating system can command the device to deliver data, accept
data, switch itself on or off, or otherwise perform some action. By reading
from these registers, the operating system can learn what the device's state
is, whether it is prepared to accept a new command, and so on.
In addition to the control
registers, many devices have a data buffer that the operating system can read
and write. For example, a common way for computers to display pixels on the
screen is to have a video RAM, which is basically just a data buffer, available
for programs or the operating system to write into.
The issue thus arises of
how the CPU communicates with the control registers and the device data
buffers. Two alternatives exist. In the first approach, each control register
is assigned an I/O port number, an 8- or
16-bit integer. Using a special I/O instruction such as
He CPU can read in control register PORT
and store the result in CPU register REG.
Similarly, using the CPU can write the contents of REG
to a control register. Most early computers, including nearly all mainframes,
such as the IBM 360 and all of its successors, worked this way.
4.10.3 Interrupts
Usually, controller
registers have one or more status bits that
can be tested to determine if an output operation is complete or if new data is
available from an input device. A CPU can execute a loop, testing a status bit
each time until a device is ready to accept or provide new data. This is called
polling or busy
waiting. In the realm of I/O, where you might have to wait a very long
time for the outside world to accept or produce data, polling is not acceptable
except for very small dedicated systems not running multiple processes.
In addition to status
bits, many controllers use interrupts to tell the CPU when they are ready to
have their registers read or written. In the context of I/O, all you need to
know is that most interface devices provide an output which is logically the
same as the "operation complete" or "data ready" status bit
of a register, but which is meant to be used to drive one of the IRQ (Interrupt
ReQuest) lines of the system bus. Thus when an interrupt-enabled operation
completes, it interrupts the CPU and starts the interrupt handler running. This
piece of code informs the operating system that I/O is complete. The operating
system may then check the status bits to verify that all went well, and either
harvest the resulting data or initiate a retry.
The number of inputs to
the interrupt controller may be limited; Pentium-class PCs have only 15
available for I/O devices. Some controllers are hard-wired onto the system
parent board, for example, the disk and keyboard controllers of an IBM PC. On
older systems, the IRQ used by the device was set by a switch or jumper
associated with the controller. If a user bought a new plug-in board, he had to
manually set the IRQ to avoid conflicts with existing IRQs. Few users could do
this correctly, which led the industry to develop Plug
'n Play, in which the BIOS can automatically assign IRQs to devices at
boot time to avoid conflicts.
No comments:
Post a Comment