UNIT-V
5. FILE SYSTEMS
5.1 Files
A file is a collection of similar
records. The file is treated as a single entity by users and applications and
may be referred by name. Files have unique file
names and may be created and deleted. Restrictions on access
control usually
apply at the file level.
A file is a container for a collection
of information. The file manager provides a protection mechanism to allow users
administrator how processes executing on behalf of different users can access
the information in a file. File protection is a fundamental property of files
because it allows different people to store their information on a shared
computer.
File represents programs and data. Data
files may be numeric, alphabetic,
binary or alpha numeric. Files may be free form, such as text
files. In general,
file is sequence of bits, bytes, lines or records.
A file has a certain defined structure according to its type.
1 Text File
2 Source File
3 Executable File
4 Object File
5.1.1File Structure
Four terms are use for files
• Field
• Record
• Database
A
field is the basic element of data. An
individual field contains
a single
value. A record is a collection of related fields that can be
treated as a unit by some application program.
A file is a collection of similar
records. The file is treated as a singly entity by users and applications and
may be referenced by name. Files have file names and maybe created and deleted.
Access control restrictions usually apply at the file level.
A database is a collection of related
data. Database is designed for use by a
number of different
applications. A database may contain
all of the information
related to an organization or project, such as a business or a
scientific study. The
database itself consists of one or more types of files. Usually,
there is a separate
database management system that is independent of the operating
system.
5.1.2 File Attributes
File attributes vary from one operating
system to another. The common
attributes are,
Name – only
information kept in human-readable form.
Identifier – unique
tag (number) identifies file within file system
Type – needed
for systems that support different types
Location – pointer
to file location on device
Size – current
file size
Protection – controls
who can do reading, writing, executing
Time, date, and user identification – data for protection, security, and usage monitoring Information
about files are kept in the directory structure, which is maintained on the
disk.
5.1.3 File Operations
Any file system provides not only a
means to store data organized as files,
but a collection of
functions that can
be performed on
files. Typical operations
include the following:
Create: A new file
is defined and positioned within the structure of files.
Delete: A file is
removed from the file structure and destroyed.
Open:An existing file is
declared to be "opened" by a process, allowing the
process to perform functions on the file.
Close: The file
is closed with respect to a process, so that the process no longer
may perform functions on the file, until the process opens the
file again.
Read: A process
reads all or a portion of the data in a file.
Write: A process
updates a file, either by adding new data that expands the size of
the file or by changing the values of existing data items in the
file.
File Types – Name, Extension
A common technique for implementing file types is to include the
type as part of the file name. The name is split into two parts : a name and an
extension. Following table gives the file type with usual extension and
function.
5.1.4 File Management Systems:
A file management system is that set of system software that
provides services to
users and applications
in the use of files. following objectives
for a file
management system:
• To meet the data management needs and requirements of the user
which include
storage of data and the ability to perform the aforementioned
operations.
• To guarantee, to the extent possible, that the data in the file
are valid.
• To optimize performance, both from the system point of view in
terms of overall
throughput.
• To provide I/O support for a variety of storage device types.
• To minimize or eliminate the potential for lost or destroyed
data.
• To provide a standardized set of I/O interface routines to use
processes.
TO provide I/O support for multiple users, in the case of multiple-user
Systems File System
Architecture. At the lowest level,device drivers
Communicate directly with peripheral devices or their controllers
or channels. A
device driver is responsible for starting I/O operations on a
device and processing
the completion of an I/O request. For file operations, the typical
devices controlled
are disk and tape
drives. Device drivers
are usually considered to
be part of the operating system.
The I/O control, consists of device
drivers and interrupt handlers to transfer
information between the memory
and the disk system. A
device driver can be
thought of as a translator.
The basic file
system needs only
to issue generic commands to the
appropriate device driver to read and write physical blocks on the
disk.
The file-organization module knows about files and their logical
blocks, as
well as physical
blocks. By knowing the type of file allocation used
and the
location of the
file, the file-organization module can
translate logical block
addresses to physical
block addresses for
the basic file system
to transfer. Each file's logical blocks are numbered from 0 (or 1) through N,
whereas the physical
blocks containing the data
usually do not
match the logical numbers, so
a
translation is needed
to locate each block. The file-organization module also
includes the
free-space manager, which tracks
unallocated and provides
these
blocks to the file organization module when requested.
The logical file system
uses the directory structure to
provide the file-
organization module
with the information the latter needs, given a symbolic file
name. The logical file system is also responsible for protection
and security.
To create a new file, an
application program calls
the logical file system.
The
logical file system knows the format of the directory structures. To create a new file, it reads the appropriate
directory into memory, updates it with the new entry, and writes it back to the
disk.
Once the file is found the associated
information such as size, owner, access
permissions and data block locations are generally copied into a
table in memory,
referred to as the open-file fable, consisting of information
about all the currently
opened files.
The first reference to a file (normally
an open) causes the directory structure
to be searched
and the directory entry
for this file to be
copied into the
table of opened files. The
index into this
table is returned to the
user program, and all
further references are made
through the index rather than with a symbolic name.
The name given to the index varies. Unix systems refer to it as a
file descriptor,
Windows/NT as a file handle, and other systems as a file control
block.
Consequently, as long as the file is not closed, all file
operations are done on
the open-file table.
When the file is closed
by all users
that have opened it, the
updated file information is copied back to the disk-based
directory structure.
File-System Mounting
As a file must be opened before it is
used, a file system must be mounted
before it can be available to processes on the system. The mount
procedure
is straight forward. The stem is
given the name of the device,
and the
location within the file structure at which to attach the file
system (called the
mount point).
The operating system verifies that the
device contains a valid file system. It does so by asking the device driver to
read the device directory and verifying
that the directory
has the expected format. Finally, the operating system
notes in its directory structure that a file system is mounted at
the specified
mount point. This scheme enables the
operating system to
traverse its
directory structure, switching among file systems as appropriate.
Allocation Methods
The direct-access nature of disks allows
us flexibility in the implementation
of files. Three major
methods of allocating disk
space are in wide use:
contiguous, linked and
indexed. Each method has
its advantages and
disadvantages.
Contiguous Allocation
The contiguous allocation
method requires each
file to occupy
a set of
contiguous blocks on the disk. Disk addresses define a linear
ordering on the
disk. Notice that with this ordering assuming that only one job is
accessing
the disk, accessing block
b + 1 after block
b normally requires
no head
movement.
When head movement is needed, it is only
one track. Thus, the number of
disk seeks required for accessing contiguously allocated files is
minimal.
Contiguous allocation of a file is defined by the disk address and
length (in
block units) of the
first block. If the file is n
blocks long, and starts
at
location!), then it occupies blocks b, b + 1, b + 2, ..., b + n –
1. The directory
entry for each file indicates the address of the starting block
and the length
of the area allocate for this file.
Accessing a file that has been allocated
contiguously is easy. For sequential
access, the file system
remembers the disk address
of the last block
referenced and, when necessary, reads the next
block. For direct access to
block i of a file that starts at block b, we can immediately
access block b + i.
The contiguous disk-space-allocation problem can be seen to be a
particular
application of the general dynamic storage-allocation First Fit
and Best Fit
are the most common
strategies used to
select a free hole from
the set of
available holes.
Simulations have shown that
both first-fit and
best-fit are
more efficient than worst-fit
in terms of both
time and storage utilization.
Neither first-fit nor best-fit is clearly best in terms of storage
utilization, but
first-fit is generally faster.
These algorithms suffer from the problem
of external fragmentation. As files
are allocated and
deleted, the free disk
space is broken
into little pieces.
External fragmentation exists whenever free space is broken into
chunks. It
becomes a problem when the largest contiguous chunks is
insufficient for a
request; storage is
fragmented into a number of holes, no one of which
is
large enough to store the data. Depending on the total amount of
disk storage
and the average file size, external fragmentation may be either a
minor or a
major problem.
To
prevent loss of significant amounts
of disk space to
external
fragmentation, the user had
to run repacking
routine that copied the entire
file system onto another floppy disk or onto a tape. The original
floppy disk
was then freed
completely, creating one large contiguous free space. The
routine then copied the files
back onto the floppy
disk by allocating
contiguous space from this one large hole. This scheme effectively
compacts
all free space into one contiguous space, solving the
fragmentation problem.
The cost of this compaction is time.
The time cost is particularly severe for large hard disks that use
contiguous
allocation, where
compacting all the space may
take hours and may be
necessary on a weekly
basis. During this down
time, normal system
operation generally
cannot be permitted, so such
compaction is avoided
at
all costs on production machines.
A major problem is determining how much
space is needed for a file. When
the file is created, the total amount of space it will need must
be found and
allocated.
The user will normally over estimate the
amount of space needed, resulting
in considerable wasted space.
Linked Allocation
Linked
allocation solves all
problems of contiguous
allocation. With link
allocation, each file
is a linked list
disk blocks; the disk
blocks may be
scattered anywhere on the disk.
This pointer is initialized to nil (the
end-of-list pointer value) to signify an
empty file. The size field is also set to 0. A write to the file
causes a free bio
to be found via the free-space management system, and this new
block is the
written to, and is linked to the end of the file
There is
no external fragmentation
with linked allocation, and any
free!
block on the free-space list can be used to satisfy a request.
Notice also that
there is no need to declare the size of a file when that file is
created. A file
can continue to grow
as long as
there are free blocks. Consequently, it
is
never necessary to compact disk space.
The major problem is
that it can be
used effectively for only
sequential
access files. To find the ith block of a file we must start at the
beginning of
that file, and follow the pointers until we get to the ith block.
Each access to
a pointer requires a disk read and sometimes a disk seek.
Consequently, it is
inefficient to support a direct-access capability for linked
allocation files.
Linked allocation is the space required for the pointers If a
pointer requires 4
bytes out of a 512 Byte block then 0.78 percent of the disk is
being used for
pointer, rather than for information.
The usual solution to this problem is to
collect blocks into multiples, called
clusters, and to allocate the clusters rather than blocks. For
instance, the file
system define a cluster as 4 blocks and operate on the disk in
only cluster
units.
Pointers
then use a much
smaller percentage of the file's
disk space. This
method allows the logical-to-physical block mapping to remain
simple, but
improves disk
throughput (fewer disk head seeks) and decreases the space
needed for block allocation
and free-list management. The cost of this
approach an increase in internal fragmentation.
Yet
another problem is reliability. Since the files
are linked together by
pointers scattered all
over the disk, consider what would
happen if a
pointer— were lost or
damaged. Partial solutions are to
use doubly linked
lists or to store the file name and
relative block number in each
block;
however, these schemes require even more overhead for each file.
An important variation, on the linked allocation method is the use
of a file
allocation table (FAT).
This simple but efficient
method of disk-space
allocation is used by the MS-DOS and OS/2 operating systems. A
section of
disk at the beginning of each-partition is set aside to contain
the table. The
table has one entry for each disk block, and is indexed by block
number. The
FAT is used much as is a linked list. The directory entry contains
the block
number of the first block of the file. The table entry indexed by
that block
number then contains the
block number of the next block
in the file. This
chain continues until the last block, which has a special
end-of-file value -as
the table entry. Unused blocks are indicated by a 0 table value.
Allocating a
new block to a file is a simple matter of finding the first
0-valued table entry,
and replacing the previous
end-of-file value with
the address of the
new
block. The 0 is then
replaced with the
end-offile value. An illustrative
example is the FAT structure of for a file consisting of disk
blocks 217, 618,
and 339.
Indexed Allocation
Linked
allocation solves the external-fragmentation and
size-declaration
problems of contiguous allocation. The absence of a FAT, linked
allocation
cannot support efficient
direct access, since the
pointers to the blocks
are
catered with the
blocks themselves all
over the disk
and need to be
retrieved in order Indexed allocation solves this problem by
bringing all the
pointers together into one location: the index block.
Each file has its own index block, which
is an array of disk-block addresses.
The
ith entry
in the index
block points to the
ith block
of the file. The
directory contains the address of the index block.
When the file is created, all pointers
in the index block are set to nil. When
the ith block is first
written, a block is obtained: from the free space manager,
and its address- is put in the ith index-block entry.
Allocation supports
direct access, without suffering
from external
fragmentation because
any free block on he disk may
satisfy a request for
more space.
Indexed allocation does suffer from
wasted space. The pointer overhead of
the index block
is generally greater than
the pointer overhead of linked
allocation.
1.Linked scheme. An index block is normally one disk block.
Thus, it
can be read and written directly by itself.
2.Multilevel index. A
variant of the linked representation is to use a
first-level index block to point to a set of second-level index
blocks,
which in turn point to the file blocks. To access a block, the
operating
system uses the first-level index
to find a second-level index
block,
and that block to find the desired data block.
Free-Space Management
Since there is only a limited amount of
disk space, it is necessary to reuse the space from deleted files for new
files, if possible.
Bit Vector
Free-space list is
implemented as a bit
map or bit vector. Each
block is
represented by 1 bit. If the block is free, the bit is 1; if the
block is allocated,
the bit is 0.
For example consider a disk where blocks 2, 3, 4, 5, 8, 9, 10, 11,
12, 13, 17,
18, 25, 26, and 27 are free, and the rest of the blocks are
allocated. The free-
space bit map would be
001111001111110001100000011100000 …..
The main
advantage of this
approach is that
it is relatively
simple and
efficient to find the first free block or n consecutive free
blocks on the disk.
The calculation of the block number is
(number of bits per word) x (number of 0-value words) + offset of
first 1 bit
Linked List
Another approach is
to link together
all the free disk blocks, keeping a
pointy to the first free block in a special location on the disk
and caching it
in memory. This first
block contains a pointer to
the next free disk
block,
and so on. Block 2 would contain a pointer to block 3, which would
point to
block 4, which would point to block 5, which would point to block
8, and so
on. Usually, the operating
system simply needs
a free block so that
it can
allocate that block to a file, so the first block in the free list
is used.
Grouping
A
modification of the
free-list approach is to store the
addresses of n free
blocks in the first free block. The first n-1 of these blocks are
actually free.
The importance of this implementation is
that the addresses of a large
number of free blocks can be found quickly, unlike in the standard
linked-
list approach.
Counting
Several
contiguous blocks may be
allocated or freed simultaneously,
particularly when space is allocated with the contiguous
allocation algorithm
or through clustering.
A list
of n free disk addresses, we can keep
the
address of the first free
block and the number n
of free contiguous blocks
that follow the first block.
Each entry in the free-space list then
consists of a disk address and a count.
Although each entry requires more space than would a simple disk
address,
the overall list will be shorter, as long as count is generally
greater than 1.
5.2
DIRECTORIES
To
keep track of files, file systems normally have directories or folders
, which, in many systems, are themselves files. In this section we will
discuss directories, their organization, their properties, and the operations
that can be performed on them.
5.2.1
Single-Level Directory Systems
The
simplest form of directory system is having one directory containing all the
files. Sometimes it is called the root directory , but since it is the
only one, the name does not matter much. On early personal computers, this
system was common, in part because there was only one user. Interestingly
enough, the world’s first supercomputer, the CDC 6600, also had only a single
directory for all files, even though it was used by many users at once. This
decision was no doubt made to keep the software design simple.
Here the directory
contains four files. The file owners are shown in the figure, not the
file names (because the owners are important to the point we are about
to make). The advantages of this scheme are its simplicity and the ability to
locate files quickly—there is only one place to look, after all.
A
single-level directory system containing four files, owned by three different
people, A , B , and C .
The
problem with having only one directory in a system with multiple users is that different
users may accidentally use the same names for their files. For example, if user
A creates a file called mailbox , and then later user B also
creates a file called mailbox , B’s file will overwrite A ’s
file. Consequently, this scheme is not used on multiuser systems any more, but
could be used on a small embedded system, for example, a system in a car that
was designed to store user profiles for a small number of drivers.
5.2.2
Two-level directory systems
To
avoid conflicts caused by different users choosing the same file name for their
own files, the next step up is giving each user a private directory. In that
way, names chosen by one user do not interfere with names chosen by a different
user and there is no problem caused by the same name occurring in two or more
directories. This design leads to the system of Fig. 6-8. This design could be used,
for example, on a multiuser computer or on a simple network of personal
computers that shared a common file server over a local area network.
A
two-level directory system. The letters indicate the owners of the directories
and files. Implicit in this design is that when a user tries to open a file,
the system knows which user it is in order to know which directory to search.
As a consequence, some kind of login procedure is needed, in which the user
specifies a login name or identification, something not required with a
single-level directory system.
When
this system is implemented in its most basic form, users can only access files
in their own directories. However, a slight extension is to allow users to
access other users’ files by providing some indication of whose file is to be
opened. Thus, for example, open("x") might be the call to open a file
called x in the user’s directory, and open("nancy/x") might be
the call to open a file x in the directory of another user, Nancy.
One
situation in which users need to access files other than their own is to
execute system binary programs. Having copies of all the utility programs
present in each directory clearly is inefficient. At the very least, there is a
need for a system directory with the executable binary programs.
5.2.3
Hierarchical Directory Systems
The
two-level hierarchy eliminates name conflicts among users but is not
satisfactory for users with a large number of files. Even on a single-user
personal computer, it is inconvenient. It is quite common for users to want to
group their files together in logical ways. A professor for example, might have
a collection of files that together form a book that he is writing for one
course, a second collection of files containing student programs submitted for
another course, a third group of files containing the code of an advanced compiler-writing
system he is building, a fourth group of files containing grant proposals, as
well as other files for electronic mail, minutes of meetings, papers he is writing,
games, and so on. Some way is needed to group these files together in flexible ways
chosen by the user.
What
is needed is a general hierarchy (i.e., a tree of directories). With this
approach, each user can have as many directories as are needed so that files
can be grouped together in natural ways. This approach is shown in Fig. 6-9.
Here, the directories A , B , and C contained in the root
directory each belong to a different user, two of whom have created
sub-directories for projects they are working on.
A hierarchical directory system. The
ability for users to create an arbitrary number of sub-directories provides a
powerful structuring tool for users to organize their work. For this reason,
nearly all modern file systems are organized in this manner.
5.3ANTIVIRUS
AND ANTI-ANTIVIRUS TECHNIQUES
Viruses
try to hide and users try to find them, which leads to a cat-and-mouse game.
Let us now look at some of the issues here. To avoid showing up in directory
listings, a companion virus, source code virus, or other file that should not
be there can turn on the HIDDEN bit in Windows or use a file name beginning
with the . character in UNIX. More sophisticated is to modify Windows’ explorer
or UNIX’ ls to refrain from listing files whose names begin with Virgil
- . Viruses can also hide in unusual and unsuspected places, such as the
bad sector list on the disk or the Windows registry (an in-memory database
available for programs to store uninterpreted strings). The flash ROM used to hold
the BIOS and the CMOS memory are also possibilities although the former is hard
to write and the latter is quite small. And, of course, the main workhorse of
the virus world is infecting executable files and documents on the hard disk.
Virus
Scanners
Clearly,
the average garden-variety user is not going to find many viruses that do their
best to hide, so a market has developed for antivirus software. Below we will
discuss how this software works. Antivirus software companies have laboratories
in which dedicated scientists work long hours tracking down and understanding
new viruses. The first step is to have the virus infect a program that does
nothing, often called a goat file , to get a copy of the virus in its
purest form. The next step is to make an exact listing of the virus’ code and
enter it into the database of known viruses. Companies compete on the size of
their databases. Inventing new viruses just to pump up your database is not considered
sporting.
Once
an antivirus program is installed on a customer’s machine, the first thing it
does is scan every executable file on the disk looking for any of the viruses
in the database of known viruses. Most antivirus companies have a Web site from
which customers can download the descriptions of newly-discovered viruses into
their databases. If the user has 10,000 files and the database has 10,000
viruses, some clever programming is needed to make it go fast, of course.
Since
minor variants of known viruses pop up all the time, a fuzzy search is needed, o a 3-byte change to a virus does not let it
escape detection. However, fuzzy searches are not only slower than exact
searches, but they may turn up false alarms, that is, warnings about legitimate
files that happen to contain some code vaguely similar to a virus reported in
Pakistan 7 years ago. What is the user supposed to do with the message:
WARNING!
File xyz.exe may contain the lahore-9x virus. Delete?
The
more viruses in the database and the broader the criteria for declaring a hit,
the more false alarms there will be. If there are too many, the user will give
up in disgust. But if the virus scanner insists on a very close match, it may
miss some modified viruses.
Getting
it right is a delicate heuristic balance. Ideally, the lab should try to
identify some core code in the virus that is not likely to change and use this
as the virus signature to scan for.
Just
because the disk was decaled virus free last week does not mean that it still
is, so the virus scanner has to be run frequently. Because scanning is slow, it
is more efficient to check only those files that have been changed since the
date of the last scan. The trouble is, a clever virus will reset the date of an
infected file to its original date to avoid detection. The antivirus program’s
response to that is to check the date the enclosing directory was last changed.
The virus’ response to that is to reset the directory’s date as well. This is
the start of the cat-and-mouse game alluded to above.
Another
way for the antivirus program to detect file infection is to record and store
on the disk the lengths of all files. If a file has grown since the last check,
it might be infected, as shown in Fig. 9-16(a-b). However, a clever virus can
avoid detection by compressing the program and padding out the file to its
original length. To make this scheme work, the virus must contain both
compression and decompression procedures, as shown in Fig. 9-16(c).
Another
way for the virus to try to escape detection is to make sure its representation
on the disk does not look at all like its representation in the antivirus
software’s database. One way to achieve this goal is to encrypt itself with a
different key for each file infected. Before making a new copy, the virus
generates a random 32-bit encryption key, for example by XORing the current
time with the contents of, say, memory words 72,008 and 319,992. It then XORs
its code with this key, word by word to produce the encrypted virus stored in
the infected file, The key is stored in the file. For secrecy purposes, putting
the key in the file is not ideal, but the goal here is to foil the virus
scanner, not prevent the dedicated scientists at the antivirus lab from reverse
engineering the code. Of course, to run, the virus has to first decrypt itself,
so it needs a decrypting procedure in the file as well.
This
scheme is still not perfect because the compression, decompression, encryption,
and decryption procedures are the same in all copies, so the antivirus program
can just use them as the virus signature to scan for. Hiding the compression,
decompression, and encryption procedures is easy: they are just encrypted along
with the rest of the virus, as shown in Fig. 9-16(e). The decryption code
cannot be encrypted, however. It has to actually execute on the hardware to
decrypt the rest of the virus so it must be present in plaintext form.
Antivirus programs know this, so they hunt for the decryption procedure.
However,
Virgil enjoys having the last word, so he proceeds as follows. Suppose that the
decryption procedure needs to perform the calculation X = (A + B + C – 4) The
straightforward assembly code for this calculation for a generic two-address computer
is shown in Fig. 9-17(a). The first address is the source; the second is the destination,
so MOV A,R1 moves the variable A to the register R1 .
only
less efficiently due to the NOP (no operation) instructions interspersed with
the real code.
MOV
A ,R1
ADD
B, R1
ADD
C, R1
SUB #4, R1
MOV
R1, X
MOV A, R1
NOP
ADD
B, R1
NOP
ADD C, R1
NOP
SUB #4, R1
NOP
MOV
R1, X
MOV A, R1
ADD
#0, R1
ADD
B, R1
OR
R1, R1
ADD C, R1
SHL #0, R1
SUB
#4, R1
JMP
.+1
MOV R1, X
MOV
A, R1
OR
R1, R1
ADD
B, R1
MOV
R1, R5
ADD
C, R1
SHL
R1 ,0
SUB
#4, R1
ADD
R5, R5
MOV R1, X
MOV R5, Y
MOV A, R1
TST
R1
ADD
C, R1
MOV
R1, R5
ADD B, R1
CMP
R2, R5
SUB
#4, R1
JMP
.+1
MOV R1, X
MOV
R5, Y
But
we are not done yet. It is also possible to disguise the decryption code. There
are many ways to represent NOP . For example, adding 0 to a register, ORing it
with itself, shifting it left 0 bits, and jumping to the next instruction all
do nothing.
when
executed. A virus that mutates on each copy is called a polymorphic virus .Now
suppose that R5 is not needed during this piece of the code.. Finally, in many
cases it is possible to swap instructions without changing what the program
does as another code fragment that is logically equivalent A piece of code that
can mutate a sequence of machine instructions without changing its functionally
is called a mutation engine , and sophisticated viruses contain them to
mutate the decryptor from copy to copy. The mutation engine itself can be
hidden by encrypting it along with the body of the virus.
Asking
the poor antivirus software to realize are all functionally equivalent is
asking a lot, especially if the mutation engine has many tricks up its sleeve.
The antivirus software can analyze the code to see what it does, and it can even
try to simulate the operation of the code, but remember it may have thousands
of viruses and thousands of files to analyze so it does not have much time per
test or it will
run
horribly slowly.
As
an aside, the store into the variable Y was thrown in just to make it
harder to detect the fact that the code related to R5 is dead code, that is,
does not do anything. If other code fragments read and write Y , the
code will look perfectly legitimate. A well-written mutation engine that
generates good polymorphic code can give antivirus software writers nightmares.
The only bright side is that such an engine is hard to write, so Virgil’s
friends all use his code, which means there are not so many different ones in circulation—yet.
So
far we have talked about just trying to recognize viruses in infected
executable files. In addition, the antivirus scanner has to check the MBR, boot
sectors, bad sector list, flash ROM, CMOS memory, etc but what if there is a
memory-resident virus currently running? That will not be detected. Worse yet,
suppose the running virus is monitoring all system calls. It can easily detect
that the antivirus program is reading the boot sector (to check for viruses).
To thwart the antivirus program, the virus does not make the system call.
Instead it just returns the true boot sector from its hiding place in the bad block
list. It also makes a mental note to reinfect all the files when the virus
scanner is finished.
To
prevent being spoofed by a virus, the antivirus program could make hard reads
to the disk, bypassing the operating system. However this requires having
built-in device drivers for IDE, SCSI, and other common disks, making the
antivirus program less portable and subject to failure on computers with
unusual disks. Furthermore, since bypassing the operating system to read the
boot sector is possible, but bypassing it to read all the executable files is
not, there is also some danger that the virus can produce fraudulent data about
executable files as well.
Integrity
Checkers
A
completely different approach to virus detection is integrity checking .
An antivirus program that works this way first scans the hard disk for viruses.
Once it is convinced that the disk is clean, it computes a checksum for each
executable file and writes the list of checksums for all the relevant files in
a directory to a file, checksum , in that directory.
The
next time it runs, it recomputes all the checksums and sees if they match what
is in the file checksum . An infected file will show up immediately.
The
trouble is Virgil is not going to take this lying down. He can write a virus
that removes the checksum file. Worse yet, he can write a virus that computes
the checksum of the infected file and replaces the old entry in the checksum
file. To protect against this kind of behavior, the antivirus program can try
to hide the checksum file, but that is not likely to work since Virgil can
study the antivirus program carefully before writing the virus. A better idea
is to encrypt it to make tampering easier to detect. Ideally, the encryption
should involve use of a smart card with an externally stored key that programs
cannot get at.
Behavioral
Checkers
A
third strategy used by antivirus software is behavioral checking . With
this approach, the antivirus program lives in memory while the computer is
running and catches all system calls itself. The idea is that it can then
monitor all activity and try to catch anything that looks suspicious. For
example, no normal program should attempt to overwrite the boot sector, so an
attempt to do so is almost certainly due to a virus.
Likewise,
changing the flash ROM is highly suspicious.
But
there are also cases that are less clear cut. For example, overwriting an
executable file is a peculiar thing to do—unless you are a compiler. If the
antivirus software detects such a write and issues a warning, hopefully the
user knows whether overwriting an executable makes sense in the context of the
current work. Similarly, Word overwriting a .doc file with a new
document full of macros is not necessarily the work of a virus. In Windows,
programs can detach from their executable file and go memory resident using a
special system call. Again, this might be legitimate, but a warning might still
be useful.
Viruses
do not have to passively lie around waiting for an antivirus program to kill
them, like cattle being led off to slaughter. They can fight back. A
particularly interesting battle can occur if a memory-resident virus and a
memory-resident antivirus meet up on the same computer. Years ago there was a
game called Core Wars in which two programmers faced off by each dropping a program
into an empty address space. The programs took turns probing memory, with the
object of the game being to locate and wipe out your opponent before he wiped
you out. The virus-antivirus confrontation looks a little like that, only the
battlefield is the machine of some poor user who does not really want it to
happen there. Worse yet, the virus has an advantage because its writer can find
out a lot about the antivirus program by just buying a copy of it. Of course,
once the virus is out there, the antivirus team can modify their program,
forcing Virgil to go buy a new copy.
Virus
Avoidance
Every
good story needs a moral. The moral of this one is
Better
safe than sorry. Avoiding viruses in the
first place is a lot easier than trying to track them down once they have
infected a computer. Below are a few guidelines for individual users, but also
some things that the industry as a whole
can do to reduce the problem considerably.
What
can users do to avoid a virus infection? First, choose an operating system that
offers a high degree of security, with a strong kernel-user mode boundary and
separate login passwords for each user and the system administrator. Under
these conditions, avirus that somehow sneaks in cannot infect the system
binaries.
Second,
install only shrink-wrapped software bought from a reliable manufacturer. Even this
is no guarantee since there have been cases where disgruntled employees have slipped
viruses onto a commercial software product, but it helps a lot. Downloading software
from Web sites and bulletin boards is risky behavior.
Third,
buy a good antivirus software package and use it as directed. Be sure to get regular
updates from the manufacturer’s Web site.
Fourth,
do not click on attachments to email and tell people not to send them to you. Email
sent as plain ASCII text is always safe but attachments can start viruses when opened.
Fifth,
make frequent backups of key files onto an external medium, such as floppy
disk, CD-recordable, or tape. Keep several generations of each file on a series
of backup media. That way, if you discover a virus, you may have a chance to
restore files as they were before they were infected. Restoring yesterday’s
infected file does not help, but restoring last week’s version might.
The
industry should also take the virus threat seriously and change some dangerous practices.
First, make simple operating systems. The more bells and whistles there are, the
more security holes there are. That is a fact of life.
Second,
forget active content. From a security point of view, it is a disaster. Viewing a document someone sends you should
not require your running their program. JPEG files, for example, do not contain
programs, and thus cannot contain viruses. All documents should work like that.
Third,
there should be a way to selectively write protect specified disk cylinders to prevent
viruses from infecting the programs on them. This protection could be implemented
by having a bitmap inside the controller listing the write protected cylinders.
The map should only be alterable when the user has flipped a mechanical toggle
switch on the computer’s front panel.
Fourth,
flash ROM is a nice idea, but it should only be modifiable when an external toggle switch has been flipped, something
that will only happen when the user is consciously installing a BIOS update. Of
course, none of this will be taken seriously until a really big virus hits. For
example, one that hit the financial world and reset all bank accounts to 0. Of
course, by then it would be too late.
Recovery
from a Virus Attack
When
a virus is detected, the computer should be halted immediately since a
memoryresident virus may still be running. The computer should be rebooted from
a CD-ROM or floppy disk that has always been write protected, and which
contains the full operating system to bypass the boot sector, hard disk copy of
the operating system, and disk drivers, all of which may now be infected. Then
an antivirus program should be run from its original CD-ROM, since the hard
disk version may also be infected.
The
antivirus program may detect some viruses and may even be able to eliminate
them, but there is no guarantee that it will get them all. Probably the safest
course of action at this point is to save all files that cannot contain viruses
(like ASCII and JPEG files).
Those
files that might contain viruses (like Word files) should be converted
to another format that cannot contain viruses, such as that ASCII text (or at
least the macros should be removed). All the saved files should be saved on an
external medium. Then the hard disk should be reformatted using a format
program taken from a write-protected floppy disk or a CD-ROM to insure that it
itself is not infected. It is especially important that the MBR and boot
sectors are also fully erased. Then the operating system should be reinstalled
from the original CD-ROM. When dealing with virus infections, paranoia is your
best friend.
5.4 BASIC OF CRYPTOGRAPHY CONCEPTS
This topic provides a
basic understanding of cryptographic function and an overview of the
cryptographic services for the systems running the i5/OS® operating
system.
Cryptography
Cryptographic services
help ensure data privacy, maintain data integrity, authenticate communicating
parties, and prevent repudiation (when a party refutes having sent a message).
Basic encryption allows
you to store information or to communicate with other parties while preventing
non-involved parties from understanding the stored information or understanding
the communication. Encryption transforms understandable text (plaintext) into
an unintelligible piece of data (ciphertext). Decryption restores the
understandable text from the unintelligible data. Both functions involve a
mathematical formula (the algorithm) and secret data (the key).
Cryptographic algorithms
There are two types of cryptographic
algorithms:
1.
With a secret
or symmetric key algorithm, the key is a shared secret between two
communicating parties. Encryption and decryption both use the same key. The
Data Encryption Standard (DES) and the Advanced Encryption Standard (AES) are
examples of symmetric key algorithms.
There are two types of
symmetric key algorithms:
Block ciphers
In
a block cipher, the actual encryption code works on a fixed-size block of data.
Normally, the user's interface to the encrypt/decrypt operation will handle
data longer than the block size by repeatedly calling the low-level encryption
function. If the length of data is not on a block size boundary, it must be
padded.
Stream ciphers
Stream
ciphers do not work on a block basis, but convert 1 bit (or 1 byte) of data at
a time.
2.
With a public key
(PKA) or asymmetric key algorithm, a pair of keys is used. One of
the keys, the private key, is kept secret and not shared with anyone. The other
key, the public key, is not secret and can be shared with anyone. When data is
encrypted by one of the keys, it can only be decrypted and recovered by using
the other key. The two keys are mathematically related, but it is virtually
impossible to derive the private key from the public key. The RSA algorithm is
an example of a public key algorithm.
Public key algorithms are
slower than symmetric key algorithms. Applications typically use public key
algorithms to encrypt symmetric keys (for key distribution) and to encrypt
hashes (in digital signature generation).
Together, the key and the
cryptographic algorithm transform the data. All of the supported algorithms are
in the public domain. Therefore it is the key that controls access to the data.
You must safeguard the keys to protect the data.
Cryptographic operations
Different cryptographic operations
may use one or more algorithms. You choose the cryptographic operation
and algorithm(s) depending on your purpose. For example, for the purpose of
ensuring data integrity, you might want to use a MAC (message authentication
code) operation with the AES algorithm.
The system provides
several API sets that support cryptographic operations. See the System
cryptography overview information at the bottom of this topic for more
information.
Data privacy
Cryptographic operations
for the purpose of data privacy (confidentiality) prevent an unauthorized
person from reading a message. The following operations are included in data
privacy:
Encrypt and Decrypt
The encrypt operation changes plaintext data into
ciphertext through the use of a cipher algorithm and key. To restore the
plaintext data, the decrypt operation must employ the same algorithm and key.
Encryption
and decryption may be employed at any level of the operating system. There are
three levels:
Field level encryption With field level encryption,
the user application explicitly requests cryptographic services. The user
application completely controls key generation, selection, distribution, and
what data to encrypt. Session level encryption
With
encryption at the session layer, the system requests cryptographic services
instead of an application. The application may or may not be aware that
encryption is happening.
Link level encryption
Link level encryption is performed at the lowest
level of the protocol stack, usually by specialized hardware.
The Cryptographic Coprocessors and the 2058
Cryptographic Accelerator may be used for both field level encryption and
Secure Sockets Layer (SSL) session establishment encryption. While VPN is
supported in i5/OS, it does not use either coprocessor or the accelerator.
Furthermore, the system does not support SNA session level encryption at all.
Translate
The translate operation decrypts data from
encryption under one key and encrypts the data under another key. This is done
in one step to avoid exposing the plaintext data within the application
program.
Data integrity, authenticity, and non-repudiation
Encrypted data does not
mean the data can not be manipulated (e.g. repeated, deleted, or even altered).
To rely on data, you need to know that it comes from an authorized source and
is unchanged. Additional cryptographic operations are required for these
purposes.
Hash (Message Digest)
A cryptographic hash operation produces a
fixed-length output string (often called a digest) from a variable-length input
string. For all practical purposes, the following statements are true of a good
hash function:
Collision
resistant: If any portion of the data is modified, a different hash will be
generated.
One-way:
The function is irreversible. That is, given a digest, it is not possible to
find the data that produces it.
These
properties make hash operations useful for authentication purposes. For
example, you can keep a copy of a digest for the purpose of comparing it with a
newly generated digest at a later date. If the digests are identical, the data
has not been altered.
MAC
(Message Authentication Code)
A MAC operation uses a secret key and cipher
algorithm to produce a value (the MAC) which later can be used to ensure the
data has not been modified. Typically, a MAC is appended to the end of a
transmitted message. The receiver of the message uses the same MAC key, and
algorithm as the sender to reproduce the MAC. If the receiver's MAC matches the
MAC sent with the message, the data has not been altered.
The MAC operation helps
authenticate messages, but does not prevent unauthorized reading because the
transmitted data remains as plaintext. You must use the MAC operation and then
encrypt the entire message to ensure both data privacy and integrity.
HMAC (Hash MAC)
An HMAC operation uses a cryptographic hash function
and a secret shared key to produce an authentication value. It is used in the
same way a MAC is used.
Sign/Verify
A sign operation produces an authentication value called
a digital signature. A sign operation works as follows:
1. The data to be
signed is hashed, to produce a digest.
2. The digest is
encrypted using a PKA algorithm and a private key, to produce the signature.
3.
The verify operation works as follows:
4. The signature is
decrypted using the sender's PKA public key, to produce digest 1.
5. The data that was
signed is hashed, to produce digest 2.
6. If the two digests
are equal, the signature is valid.
Theoretically,
this also verifies the sender because only the sender should posses the private
key. However, how can the receiver verify that the public key actually belongs
to the sender? Certificates are used to help solve this problem.
Key and random number generation
Many security-related
functions rely on random number generation, for example, salting a password or
generating an initialization vector. An important use of random numbers is in
the generation of cryptographic key material. Key generation has been described
as the most sensitive of all computer security functions. If the random numbers
are not cryptographically strong, the function will be subject to attack.
The i5/OS operating system
contains a pseudorandom number generator (PRNG). The PRNG is used by many
system functions and is available for application use through the Cryptographic
Services API set.
The PRNG is composed of
two parts: pseudorandom number generation and seed management. Pseudorandom
number generation is performed using the FIPS 186-1 algorithm.
Cryptographically strong pseudorandom numbers rely on good seed. The FIPS 186-1
key and seed values are obtained from a system seed digest. The system
automatically generates seed using data collected from system information or by
using the random number generator function on a cryptographic coprocessor if
one is available. System-generated seed can never be truly unpredictable. If a
cryptographic coprocessor is not available, you should add your own random seed
to the system seed digest. This should be done as soon as possible any time the
Licensed Internal Code is installed.
Key management
Key management is the
secure handling and storage of cryptographic keys. This includes key storage
and retrieval, key encryption and conversions, and key distribution.
Key storage
Key storage on the system includes
the following:
·
Cryptographic Services
key store
·
Digital certificate
manager certificate store
·
CCA key store (used
with the Cryptographic Coprocessors)
·
JCE key store
In addition, keys can also be stored
on the Cryptographic Coprocessors themselves.
Key Encryption and Conversions
Keys must be encrypted prior to sending or storing
them outside the secured system environment. In addition, keys should be
handled in encrypted form within the system as much as possible to reduce the
risk of exposure. The management of encrypted keys is often done via a
hierarchical key system.
·
At the top is a master
key (or keys). The master key is the only clear key value and must be stored in
a secure fashion.
·
Key-encrypting keys
(KEKs) are used to encrypt other keys. Typically, a KEK is used to encrypt a
stored key, or a key that is sent to another system. KEKs are normally
encrypted under a master key.
·
Data keys are keys used
directly on user data (such as to encrypt or MAC). A data key may be encrypted
under a KEK or under a master key.
Various
uses of a key will require the key to be in different forms. For example, keys
received from other sources will normally be converted to an internal format.
Likewise, keys sent out of the system are converted to a standard external
format before sending. Certain key forms are standard, such as an ASN.1
BER-encoded form, and others are peculiar to a cryptographic service provider,
such as the Cryptographic Coprocessors.
Key Distribution
Typically, data encryption is performed using
symmetric key algorithms. The symmetric keys are distributed using asymmetric
key algorithms. Consider these examples:
·
RSA - An RSA public key is
used to encrypt a symmetric key which is then distributed. The corresponding
private key is used to decrypt it.
·
Diffie-Hellman
- The communicating parties generate and exchange D-H parameters which are then
used to generate key pairs. The public keys are exchanged and each party is
then able to compute the symmetric key independently.
5.5 SECURITY
Many companies possess valuable information that they
guard closely. This information can be technical (e.g., a new chip design or
software), commercial (e.g., studies of the competition or marketing plans),
financial (e.g., plans for a stock offering), legal (e.g., documents about a
potential merger or takeover), among many other possibilities. Frequently this
information is protected by having a uniformed guard at the building entrance
who checks to see that all people entering the building are wearing a proper
badge. In addition, many offices may be locked and some file cabinets may be
locked as well to ensure that only authorized people have access to the
information.
As more and more of this information is stored in
computer systems, the need to protect it is becoming increasingly important
Protecting this information against unauthorized usage is therefore a major
concern of all operating systems. Unfortunately, it is also becoming
increasingly difficult due to the widespread acceptance of system bloat as
being a normal and acceptable phenomenon. In the following sections we will
look at a variety of issues concerned with security and protection, some of
which have analogies to real-world protection of information on paper, but some
of which are unique to computer systems. In this chapter we will examine
computer security as it applies to operating systems.
5.6 THE SECURITY ENVIRONMENT
Some people use the terms “security” and “protection”
interchangeably. Nevertheless, it is frequently useful to make a distinction
between the general problems involved in making sure that files are not read or
modified by unauthorized persons, which include technical, administrative,
legal, and political issues on the one hand, and the specific operating system
mechanisms used to provide security, on the other. To avoid confusion, we will
use the term security to refer to the overall problem, and the term protection
mechanisms to refer to the specific operating system mechanisms used to
safeguard information in the computer. The boundary between them is not well
defined, however. First we will look at security to see what the nature of the
problem is. Later on in the chapter we will look at the protection mechanisms
and models available to help achieve security.
Security has many facets. Three of the more important
ones are the nature of the threats, the nature of intruders, and accidental
data loss. We will now look at these in turn.
5.6.1
Threats
From a security perspective, computer systems have
three general goals, with corresponding threats to them, as listed in Fig. 9-1.
The first one, data confidentiality , is concerned with having secret
data remain secret. More specifically, if the owner of some data has decided
that these data are only to be made available to certain people and no others,
the system should guarantee that release of the data to unauthorized people
does not occur. As a bare minimum, the owner should be able to specify who can
see what, and the system should enforce these specifications.
The second goal, data integrity , means that
unauthorized users should not be able to modify any data without the owner’s
permission. Data modification in this context includes not only changing the
data, but also removing data and adding false data as well. If a system cannot
guarantee that data deposited in it remain unchanged until the owner decides to
change them, it is not worth much as an information system.
The third goal, system availability , means
that nobody can disturb the system to make it unusable. Such denial of
service attacks are increasingly common. For example, if a computer is an
Internet server, sending a flood of requests to it may cripple it by eating up
all of its CPU time just examining and discarding incoming requests. If it
takes, say, 100 sec to process an incoming request to read a Web page, then
anyone who manages to send 10,000 requests/sec can wipe it out. Reasonable
models and technology for dealing with attacks on confidentiality and integrity
are available; foiling denial-of-services attacks is much harder.
Another aspect of the security problem is privacy
: protecting individuals from misuse of information about them. This quickly
gets into many legal and moral issues. Should the government compile dossiers
on everyone in order to catch X-cheaters, where X is “welfare” or
“tax,” depending on your politics? Should the police be able to look up
anything on anyone in order to stop organized crime? Do employers and insurance
companies have rights? What happens when these rights conflict with individual
rights? All of these issues are extremely important but are beyond the scope of
this book.
5.6.2 Intruders
Most people are pretty nice and obey the law, so why
worry about security? Because there are unfortunately a few people around who
are not so nice and want to cause trouble (possibly for their own commercial
gain). In the security literature, people who are nosing around places where
they have no business being are called intruders or sometimes adversaries
. Intruders act in two different ways. Passive intruders just want to read
files they are not authorized to read. Active intruders are more malicious;
they want to make unauthorized changes to data. When designing a system to be
secure against intruders, it is important to keep in mind the kind of intruder
one is trying to protect against. Some common categories are
1
Casual
prying by nontechnical users. Many people have personal computers on their
desks that are connected to a shared file server, and human nature being what
it is, some of them will read other people’s electronic mail and other files if
no barriers are placed in the way. Most UNIX systems, for example, have the
default that all newly created files are publicly readable.
2
Snooping
by insiders. Students, system programmers, operators, and other technical
personnel often consider it to be a personal challenge to break the security of
the local computer system. They often are highly skilled and are willing to
devote a substantial amount of time to the effort.
3. Determined attempts to make money. Some
bank programmers have attempted to steal from the bank they were working for.
Schemes have
varied
from changing the software to truncate rather than round interest, keeping the
fraction of a cent for themselves, to siphoning off accounts not used in years,
to blackmail (“Pay me or I will destroy all the bank’s records.”).
3
Commercial
or military espionage. Espionage refers to a serious and well-funded attempt by
a competitor or a foreign country to steal programs, trade secrets, patentable
ideas, technology, circuit designs, business plans, and so forth. Often this attempt
will involve wiretapping or even erecting antennas directed at the computer to
pick up its electromagnetic radiation.
It should be clear that trying to keep a hostile
foreign government from stealing military secrets is quite a different matter
from trying to keep students from inserting a funny message-of-the-day into the
system. The amount of effort needed security and protection clearly depends on
who the enemy is thought to be.
Another category of security pest that has manifested
itself in recent years is the virus, which will be discussed at length below.
Basically a virus is a piece of code that replicates itself and (usually) does
some damage. In a sense, the writer of a virus is also an intruder, often with
high technical skills. The difference between a conventional intruder and a
virus is that the former refers to a person who is personally trying to break
into a system to cause damage whereas the latter is a program written by such a
person and then released into the world hoping it causes damage. Intruders try
to break into specific systems (e.g., one belonging to some bank or the
Pentagon) to steal or destroy particular data, whereas a virus usually causes
more general damage. In a sense, an intruder is like someone with a gun who tries
to kill a specific person; a virus writer is more like a terrorist bomber who
just wants to kill people in general, rather than some particular person.
5.6.3 Accidental Data Loss
In addition to threats caused by malicious intruders,
valuable data can be lost by accident. Some of the common causes of accidental
data loss are
Acts of God: fires, floods,
earthquakes, wars, riots, or rats gnawing tapes or floppy disks.
Hardware or software errors: CPU malfunctions, unreadable
disks or tapes telecommunication errors, program bugs.
Human errors: incorrect data entry,
wrong tape or disk mounted, wrong program run, lost disk or tape, or some other
mistake.
Most of these can be dealt with by maintaining
adequate backups, preferably far away from the original data. While protecting
data against accidental loss may seem mundane compared to protecting against
clever intruders, in practice, probably more damage is caused by the former
than the latter.
5.7 ATTACKS FROM INSIDE THE SYSTEM
Once a cracker has logged into a computer, he can
start doing damage. If the computer has good security, it may only be possible
to harm the user whose account has been broken, but often this initial entry
can be leveraged to break into more accounts later. In the following sections,
we will look at some attacks that can be set up by someone already logged in,
either a cracker who has gotten in illicitly or possibly a legitimate user with
a grudge against someone.
5.7.1
Trojan Horses
One hoary insider attack is the Trojan horse ,
in which a seemingly innocent program contains code to perform an unexpected
and undesirable function. This function might be modifying, deleting or
encrypting the user’s files, copying them to a place where the cracker can
retrieve them later, or even sending them to the cracker or a temporary safe
hiding place via email or FTP. To have the Trojan horse run, the person
planting it first has to get the program carrying it executed. One way is to
place the program on the Internet as a free, exciting new game, MP3 viewer,
“special” porno viewer, or something else likely to attract attention, and
encourage people to download it. When it runs, the Trojan horse procedure is
called and can do anything the user can do (e.g., delete files, open network
connections, etc.). Note that this ploy does not require the author of the
Trojan horse to break into the victim’s computer.
There are other ways to trick the victim into
executing the Trojan horse program as well. For example, many UNIX users have
an environment variable, $PATH , which controls which directories are
searched for a command. It can be viewed by typing the following command to the
shell:
echo
$PATH
A potential setting for the user ast on a
particular system might consist of the following directories:
:/usr/ast/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/ucb:/usr/man\
:/usr/java/bin:/usr/java/lib:/usr/local/man:/usr/openwin/man
Other users are likely to have a different search
path. When the user types
prog
to the shell, the shell first takes a look to see if
there is a program named /usr/ast/bin/prog . If there is, it is
executed. If it is not there, the shell tries /usr/local/bin/prog , /usr/bin/prog
, /bin/prog , and so on, trying all 10 directories in turn before giving
up. Suppose that just one of these directories was left unprotected so a
cracker could put a program there. If this is the first occurrence of the
program in the list, it will be executed and the Trojan horse will run.
Most common programs are in /bin or /usr/bin
, so putting a Trojan horse in /usr/bin/X11/ls does not work for a
common program because the real one will be found first. However, suppose the
cracker inserts la into /usr/bin/X11 . If a user mistypes la instead
of ls (the directory listing program), now the Trojan horse will run, do
its dirty work, and then issue the correct message that la does not
exist. By inserting Trojan horses into complicated directories that hardly
anyone ever looks at and giving them names that could represent common typing
errors, there is a fair chance that someone will invoke one of them sooner or
later. And that someone might be the superuser (even superusers make typing
errors), in which case the Trojan horse now has the opportunity to replace /b
in/ls with a version containing a Trojan horse, so it will be invoked all
the time now.
A malicious but legal user, Mal, could also lay a trap
for the superuser as follows. He puts a version of ls containing a
Trojan horse in his own directory and then does something suspicious that is
sure to attract the superuser’s attention, such as starting up 100
compute-bound processes at once. Chances are the superuser will check that out
by typing
cd /usr/mal
ls -l
ls -l
to see what Mal has in his home directory. Since some
shells try the local directory before working through $PATH , the
superuser may have just invoked Mal’s Trojan horse with superuser power. The
Trojan horse could make /usr/mal/bin/sh SETUID root. All it takes is two
system calls: chown to change the owner of /usr/mal/bin/sh to root and
chmod , to set its SETUID bit. Now Mal can become superuser at will by just
running that shell.
If Mal finds himself frequently short of cash, he
might use one of the following Trojan horse scams to help his liquidity
position. In the first one, the Trojan horse checks to see if the victim has an
online banking program, such as Quicken , installed. If so, the Trojan
horse directs the program to transfer some money from the victim’s account to a
dummy account (preferably in a far-away country) for collection in cash later.
In the second scam, the Trojan horse first turns off
the modem’s sound, then dials a 900 (pay) number, again, preferably in a
far-away country, such as Moldova (part of the former Soviet Union). If the
user was online when the Trojan horse was started, then the 900 phone number in
Moldova needs to be a (very expensive) Internet provider, so the user will not
notice and perhaps stay online for hours. Neither of these techniques is
hypothetical; both have happened and are reported in (Denning, 1999). In the
latter one, 800,000 minutes of connect time to Moldova were run up before the
U.S. Federal Trade Commission managed to get the plug pulled and filed suit
against three people on Long Island. They eventually agreed to return $2.74
million to 38,000 victims.
5.7.2 Login Spoofing
Somewhat related to Trojan
horses is login spoofing . It works as follows. Normally, when no one is
logged in on a UNIX terminal or workstation on a LAN, a screen such as Fig.
9-9(a) is displayed. When a user sits down and types a login name, the system
asks for a password. If it is correct, the user is logged in and a shell is
started.
Now consider this scenario.
Mal writes a program to display the screen of . It looks amazingly
like the screen , except that this is not the system login
program running, but a phony one written by Mal. Mal now walks away to watch the fun from a safe distance. When
a user sits down and types a login name, the program responds by asking for a
password and disabling echoing. After the login name and password have been
collected, they are written away to a file and the phony login program sends a
signal to kill its shell. This action logs Mal out and triggers the real login
program to start and display the prompt of Fig. 9-9(a). The user assumes that
she made a typing error and just logs in again. This time it works. But in the
meantime, Mal has acquired another (login name, password) pair. By logging in
at many terminals and starting the login spoofer on all of them, he can collect
many passwords.
The only real way to guard against this is to have the
login sequence start with a key combination that user programs cannot catch.
Windows 2000 uses CTRL-ALT-DEL for this purpose. If a user sits down at a
terminal and starts out by typing CTRL-ALT-DEL, the current user is logged out
and the system login program is started. There is no way to bypass this
mechanism.
5.12 ATTACKS FROM OUTSIDE THE SYSTEM
The threats discussed in the previous sections were
largely caused from the inside, that is, perpetrated by users already logged
in. However, for machines connected to the Internet or another network, there
is a growing external threat. A networked computer can be attacked from a
distant computer over the network. In nearly all cases, such an attack consists
of some code being transmitted over the network to the target machine and
executed there doing damage. As more and more computers join the Internet, the
potential for damage keeps growing. In the following sections we will look at
some of the operating systems aspects of these external threats, primarily
focusing on viruses, worms, mobile code, and Java applets.
It is hard to open a
newspaper these days without reading about another computer virus or worm
attacking the world’s computers. They are clearly a major security problem for
individuals and companies alike. In the following sections we will examine how
they work and what can be done about them.
I was somewhat hesitant to
write this section in so much detail, lest it give some people bad ideas, but
existing books give far more detail and even include real code (e.g., Ludwig,
1998). Also the Internet is full of information about viruses so the genie is
already out of the bottle. In addition, it is hard for people to defend
themselves against viruses if they do not know how they work. Finally, there
are a lot of misconceptions about viruses floating around that need correction.
Unlike, say, game programmers, successful virus
writers tend not to seek publicity after their products have made their debut.
Based on the scanty evidence there is, it appears that most are high school or
college students or recent graduates who wrote the virus as a technical
challenge, not realizing (or caring) that a virus attack can cost the
collective victims as much as a hurricane or earthquake. Let us call our
antihero Virgil the virus writer. If Virgil is typical, his goals are to
produce a virus that spreads quickly, is difficult to detect, and is hard to
get rid of once detected.
What is a virus, anyway? To
make a long story short, a virus is a program that can reproduce itself
by attaching its code to another program, analogous to how biological viruses
reproduce. In addition, the virus can also do other things in addition to reproducing
itself. Worms are like viruses but are self replicating. That difference will
not concern us here, so we will use the term “virus” to cover both for the
moment.
No comments:
Post a Comment