Tuesday, 10 June 2014

OPERATING SYSTEM UNIT-V


UNIT-V
5. FILE SYSTEMS
5.1 Files
A file is a collection of similar records. The file is treated as a single entity by users and applications and may be referred by name. Files have unique file
names and may be created and deleted. Restrictions on access control usually
apply at the file level.
A file is a container for a collection of information. The file manager provides a protection mechanism to allow users administrator how processes executing on behalf of different users can access the information in a file. File protection is a fundamental property of files because it allows different people to store their information on a shared computer.
File represents programs and data. Data files may be numeric, alphabetic,
binary or alpha numeric. Files may be free form, such as text files. In general,
file is sequence of bits, bytes, lines or records.
A file has a certain defined structure according to its type.
1 Text File
2 Source File
3 Executable File
4 Object File

5.1.1File Structure
Four terms are use for files
• Field
• Record
• Database
A  field  is  the basic element  of data. An  individual  field  contains  a single
value. A record is a collection of related fields that can be treated as a unit by some application program.
A file is a collection of similar records. The file is treated as a singly entity by users and applications and may be referenced by name. Files have file names and maybe created and deleted. Access control restrictions usually apply at the file level.
A database is a collection of related data. Database is designed for use by a
number of different  applications. A  database may  contain  all  of the  information
related to an organization or project, such as a business or a scientific study. The
database itself consists of one or more types of files. Usually, there is a separate
database management system that is independent of the operating system.
5.1.2 File Attributes
File attributes vary from one operating system to another. The common
attributes are,
Name – only information kept in human-readable form.
Identifier – unique tag (number) identifies file within file system
Type – needed for systems that support different types
Location – pointer to file location on device
Size – current file size
Protection – controls who can do reading, writing, executing
Time, date, and user identification – data for protection, security, and usage monitoring Information about files are kept in the directory structure, which is maintained on the disk.
5.1.3 File Operations
Any file system provides not only a means to store data organized as files,
but  a collection  of  functions  that  can  be  performed  on  files.  Typical  operations
include the following:
Create: A new file is defined and positioned within the structure of files.
Delete: A file is removed from the file structure and destroyed.
Open:An  existing  file is  declared  to  be "opened" by  a process, allowing  the
process to perform functions on the file.
Close: The file is closed with respect to a process, so that the process no longer
may perform functions on the file, until the process opens the file again.
Read: A process reads all or a portion of the data in a file.
Write: A process updates a file, either by adding new data that expands the size of
the file or by changing the values of existing data items in the file.
File Types – Name, Extension
A common technique for implementing file types is to include the type as part of the file name. The name is split into two parts : a name and an extension. Following table gives the file type with usual extension and function.
5.1.4 File Management Systems:
A file management system is that set of system software that provides services to
users  and  applications  in  the use of  files. following  objectives  for a  file
management system:
• To meet the data management needs and requirements of the user which include
storage of data and the ability to perform the aforementioned operations.
• To guarantee, to the extent possible, that the data in the file are valid.
• To optimize performance, both from the system point of view in terms of overall
throughput.
• To provide I/O support for a variety of storage device types.
• To minimize or eliminate the potential for lost or destroyed data.
• To provide a standardized set of I/O interface routines to use processes.

TO provide I/O support for multiple users, in the case of multiple-user
Systems File System Architecture. At the lowest level,device drivers
Communicate directly with peripheral devices or their controllers or channels. A
device driver is responsible for starting I/O operations on a device and processing
the completion of an I/O request. For file operations, the typical devices controlled
are disk  and  tape  drives.  Device  drivers  are usually  considered  to  be  part  of the operating system.
The I/O control, consists of device drivers and interrupt handlers to transfer
information  between  the memory  and  the disk  system. A  device driver  can  be
thought of as a translator.
The basic  file  system  needs  only  to  issue generic commands  to  the
appropriate device driver to read and write physical blocks on the disk.
The file-organization module knows about files and their logical blocks, as
well  as  physical  blocks. By  knowing  the type of file allocation  used  and  the
location  of  the  file,  the file-organization  module can  translate logical  block
addresses  to  physical  block  addresses  for  the basic  file  system  to  transfer. Each file's logical blocks are numbered from 0 (or 1) through N, whereas the physical
blocks  containing  the  data usually  do  not  match  the logical  numbers, so  a
translation  is  needed  to  locate each  block. The file-organization  module also
includes  the free-space  manager, which  tracks  unallocated  and  provides  these
blocks to the file organization module when requested.
The logical  file system  uses  the directory  structure to  provide the file-
organization  module with  the information  the latter needs, given  a symbolic file
name. The logical file system is also responsible for protection and security.
To  create a new  file, an  application  program  calls  the logical  file system.
The  logical  file system  knows the format  of the directory  structures. To  create a new file, it reads the appropriate directory into memory, updates it with the new entry, and writes it back to the disk.
Once the file is found the associated information such as size, owner, access
permissions and data block locations are generally copied into a table in memory,
referred to as the open-file fable, consisting of information about all the currently
opened files.
The first reference to a file (normally an open) causes the directory structure
to  be  searched  and  the directory  entry  for this  file to  be  copied  into  the  table of opened  files. The index  into  this  table is  returned  to  the user program, and  all
further references are  made through the index rather than with a symbolic name.
The name given to the index varies. Unix systems refer to it as a file descriptor,
Windows/NT as a file handle, and other systems as a file control block.
Consequently, as long as the file is not closed, all file operations are done on
the  open-file table. When  the file is  closed  by  all  users  that  have opened  it, the
updated file information is copied back to the disk-based directory structure.

File-System Mounting
As a file must be opened before it is used, a file system  must be mounted
before it can be available to processes on the system. The mount procedure
is  straight  forward. The stem  is  given  the name of the device, and  the
location within the file structure at which to attach the file system (called the
mount point).
The operating system verifies that the device contains a valid file system. It does so by asking the device driver to read the device directory and verifying
that  the  directory  has  the expected  format. Finally, the operating  system
notes in its directory structure that a file system is mounted at the specified
mount  point. This  scheme enables  the  operating  system  to  traverse its
directory structure, switching among file systems as appropriate.
Allocation Methods
The direct-access nature of disks allows us flexibility in the implementation
of  files. Three major methods  of allocating  disk  space are in  wide use:
contiguous,  linked  and  indexed. Each  method  has  its  advantages  and
disadvantages.
Contiguous Allocation
The contiguous  allocation  method  requires  each  file  to  occupy  a set  of
contiguous blocks on the disk. Disk addresses define a linear ordering on the
disk. Notice that with this ordering assuming that only one job is accessing
the  disk, accessing  block  b  + 1  after block  b  normally  requires  no  head
movement.
When head movement is needed, it is only one track. Thus, the number of
disk seeks required for accessing contiguously allocated files is minimal.
Contiguous allocation of a file is defined by the disk address and length (in
block  units) of the first  block. If the file is  n  blocks  long, and  starts  at
location!), then it occupies blocks b, b + 1, b + 2, ..., b + n – 1. The directory
entry for each file indicates the address of the starting block and the length
of the area allocate for this file.
Accessing a file that has been allocated contiguously is easy. For sequential
access, the file system  remembers  the disk  address  of the  last  block
referenced  and, when  necessary, reads  the next  block. For direct  access  to
block i of a file that starts at block b, we can immediately access block b + i.
The contiguous disk-space-allocation problem can be seen to be a particular
application of the general dynamic storage-allocation First Fit and Best Fit
are the  most  common  strategies  used  to  select  a free hole  from  the set  of
available  holes. Simulations  have shown  that  both  first-fit  and  best-fit  are
more efficient  than  worst-fit  in  terms  of both  time and  storage utilization.
Neither first-fit nor best-fit is clearly best in terms of storage utilization, but
first-fit is generally faster.
These algorithms suffer from the problem of external fragmentation. As files
are  allocated  and  deleted,  the free  disk  space  is  broken  into  little  pieces.
External fragmentation exists whenever free space is broken into chunks. It
becomes a problem when the largest contiguous chunks is insufficient for a
request;  storage  is  fragmented  into  a number of holes, no  one of which  is
large enough to store the data. Depending on the total amount of disk storage
and the average file size, external fragmentation may be either a minor or a
major problem.
To  prevent  loss  of significant  amounts  of disk  space  to  external
fragmentation,  the  user had  to  run  repacking  routine that  copied  the  entire
file system onto another floppy disk or onto a tape. The original floppy disk
was  then  freed  completely, creating  one  large contiguous  free space. The
routine then  copied  the files  back  onto  the floppy  disk  by  allocating
contiguous space from this one large hole. This scheme effectively compacts
all free space into one contiguous space, solving the fragmentation problem.
The cost of this compaction is time.
The time cost is particularly severe for large hard disks that use contiguous
allocation,  where compacting  all  the space may  take hours  and  may  be
necessary  on  a weekly  basis. During  this  down  time, normal  system
operation generally  cannot  be permitted, so  such  compaction  is  avoided  at
all costs on production machines.
A major problem is determining how much space is needed for a file. When
the file is created, the total amount of space it will need must be found and
allocated.
The user will normally over estimate the amount of space needed, resulting
in considerable wasted space.
Linked Allocation
Linked  allocation  solves  all  problems  of  contiguous  allocation. With  link
allocation, each  file is  a linked  list  disk  blocks;  the disk  blocks  may  be
scattered anywhere on the disk.
This pointer is initialized to nil (the end-of-list pointer value) to signify an
empty file. The size field is also set to 0. A write to the file causes a free bio
to be found via the free-space management system, and this new block is the
written to, and is linked to the end of the file
There is  no  external  fragmentation  with  linked  allocation, and  any  free!
block on the free-space list can be used to satisfy a request. Notice also that
there is no need to declare the size of a file when that file is created. A file
can  continue to  grow  as  long  as  there are free blocks. Consequently, it  is
never necessary to compact disk space.
The major problem  is  that  it  can  be used  effectively  for only  sequential
access files. To find the ith block of a file we must start at the beginning of
that file, and follow the pointers until we get to the ith block. Each access to
a pointer requires a disk read and sometimes a disk seek. Consequently, it is
inefficient to support a direct-access capability for linked allocation files.
Linked allocation is the space required for the pointers If a pointer requires 4
bytes out of a 512 Byte block then 0.78 percent of the disk is being used for
pointer, rather than for information.
The usual solution to this problem is to collect blocks into multiples, called
clusters, and to allocate the clusters rather than blocks. For instance, the file
system define a cluster as 4 blocks and operate on the disk in only cluster
units.
Pointers  then  use  a much  smaller percentage of the file's  disk  space. This
method allows the logical-to-physical block mapping to remain simple, but
improves  disk throughput  (fewer disk head  seeks) and decreases  the space
needed  for block  allocation  and  free-list  management. The cost  of this
approach an increase in internal fragmentation.
Yet  another problem  is  reliability. Since  the files  are linked  together by
pointers  scattered  all  over  the disk, consider what  would  happen  if a
pointer—  were lost  or  damaged. Partial  solutions  are to  use doubly  linked
lists  or to  store the file name  and  relative block  number in  each  block;
however, these schemes require even more overhead for each file.
An important variation, on the linked allocation method is the use of a file
allocation  table (FAT). This  simple but  efficient  method  of disk-space
allocation is used by the MS-DOS and OS/2 operating systems. A section of
disk at the beginning of each-partition is set aside to contain the table. The
table has one entry for each disk block, and is indexed by block number. The
FAT is used much as is a linked list. The directory entry contains the block
number of the first block of the file. The table entry indexed by that block
number then  contains  the  block  number of the next  block  in  the  file. This
chain continues until the last block, which has a special end-of-file value -as
the table entry. Unused blocks are indicated by a 0 table value. Allocating a
new block to a file is a simple matter of finding the first 0-valued table entry,
and  replacing  the previous  end-of-file  value  with  the  address  of  the new
block. The 0  is  then  replaced  with  the  end-offile  value. An  illustrative
example is the FAT structure of for a file consisting of disk blocks 217, 618,
and 339.
Indexed Allocation
Linked  allocation  solves  the external-fragmentation  and  size-declaration
problems of contiguous allocation. The absence of a FAT, linked allocation
cannot  support  efficient  direct  access, since the pointers  to  the blocks  are
catered  with  the  blocks  themselves  all  over  the  disk  and  need  to  be
retrieved in order Indexed allocation solves this problem by bringing all the
pointers together into one location: the index block.
Each file has its own index block, which is an array of disk-block addresses.
The  ith  entry  in  the  index  block  points  to  the ith  block  of  the file. The
directory contains the address of the index block.
When the file is created, all pointers in the index block are set to nil. When
the ith block is first written, a block is obtained: from the free space manager,
and its address- is put in the ith index-block entry.
Allocation  supports  direct  access, without  suffering  from  external
fragmentation  because any  free block on he disk  may  satisfy  a request  for
more space.
Indexed allocation does suffer from wasted space. The pointer overhead of
the  index  block  is  generally  greater than  the pointer  overhead  of linked
allocation.
1.Linked scheme. An index block is normally one disk block. Thus, it
can be read and written directly by itself.
2.Multilevel  index. A  variant  of the linked  representation  is  to  use a
first-level index block to point to a set of second-level index blocks,
which in turn point to the file blocks. To access a block, the operating
system  uses  the first-level  index  to  find  a second-level  index  block,
and that block to find the desired data block.
Free-Space Management
Since there is only a limited amount of disk space, it is necessary to reuse the space from deleted files for new files, if possible.
Bit Vector
Free-space list  is  implemented  as  a bit  map  or bit  vector. Each  block  is
represented by 1 bit. If the block is free, the bit is 1; if the block is allocated,
the bit is 0.
For example consider a disk where blocks 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 17,
18, 25, 26, and 27 are free, and the rest of the blocks are allocated. The free-
space bit map would be 
001111001111110001100000011100000 …..
The main  advantage  of  this  approach  is  that  it  is  relatively  simple and
efficient to find the first free block or n consecutive free blocks on the disk.
The calculation of the block number is
(number of bits per word) x (number of 0-value words) + offset of first 1 bit
Linked List
Another approach  is  to  link  together  all  the free disk  blocks, keeping  a
pointy to the first free block in a special location on the disk and caching it
in  memory. This  first  block  contains  a pointer to  the  next  free disk  block,
and so on. Block 2 would contain a pointer to block 3, which would point to
block 4, which would point to block 5, which would point to block 8, and so
on. Usually, the operating  system  simply  needs  a free block  so  that  it  can
allocate that block to a file, so the first block in the free list is used.
Grouping
A  modification  of the free-list  approach  is  to  store the  addresses  of n  free
blocks in the first free block. The first n-1 of these blocks are actually free.
The  importance of this  implementation  is  that  the addresses  of a large
number of free blocks can be found quickly, unlike in the standard linked-
list approach.
Counting
Several  contiguous  blocks  may  be allocated  or freed  simultaneously,
particularly when space is allocated with the contiguous allocation algorithm
or through  clustering. A  list  of n  free disk  addresses, we can  keep  the
address of the first  free block  and  the number n  of free contiguous blocks
that follow the first block.
Each entry in the free-space list then consists of a disk address and a count.
Although each entry requires more space than would a simple disk address,
the overall list will be shorter, as long as count is generally greater than 1.
5.2 DIRECTORIES
To keep track of files, file systems normally have directories or folders , which, in many systems, are themselves files. In this section we will discuss directories, their organization, their properties, and the operations that can be performed on them.
5.2.1 Single-Level Directory Systems
The simplest form of directory system is having one directory containing all the files. Sometimes it is called the root directory , but since it is the only one, the name does not matter much. On early personal computers, this system was common, in part because there was only one user. Interestingly enough, the world’s first supercomputer, the CDC 6600, also had only a single directory for all files, even though it was used by many users at once. This decision was no doubt made to keep the software design simple.
Here the directory contains four files. The file owners are shown in the figure, not the file names (because the owners are important to the point we are about to make). The advantages of this scheme are its simplicity and the ability to locate files quickly—there is only one place to look, after all.
A single-level directory system containing four files, owned by three different people, A , B , and C .
The problem with having only one directory in a system with multiple users is that different users may accidentally use the same names for their files. For example, if user A creates a file called mailbox , and then later user B also creates a file called mailbox , B’s file will overwrite A ’s file. Consequently, this scheme is not used on multiuser systems any more, but could be used on a small embedded system, for example, a system in a car that was designed to store user profiles for a small number of drivers.
5.2.2 Two-level directory systems
To avoid conflicts caused by different users choosing the same file name for their own files, the next step up is giving each user a private directory. In that way, names chosen by one user do not interfere with names chosen by a different user and there is no problem caused by the same name occurring in two or more directories. This design leads to the system of Fig. 6-8. This design could be used, for example, on a multiuser computer or on a simple network of personal computers that shared a common file server over a local area network.
A two-level directory system. The letters indicate the owners of the directories and files. Implicit in this design is that when a user tries to open a file, the system knows which user it is in order to know which directory to search. As a consequence, some kind of login procedure is needed, in which the user specifies a login name or identification, something not required with a single-level directory system.
When this system is implemented in its most basic form, users can only access files in their own directories. However, a slight extension is to allow users to access other users’ files by providing some indication of whose file is to be opened. Thus, for example, open("x") might be the call to open a file called x in the user’s directory, and open("nancy/x") might be the call to open a file x in the directory of another user, Nancy.
One situation in which users need to access files other than their own is to execute system binary programs. Having copies of all the utility programs present in each directory clearly is inefficient. At the very least, there is a need for a system directory with the executable binary programs.


5.2.3 Hierarchical Directory Systems
The two-level hierarchy eliminates name conflicts among users but is not satisfactory for users with a large number of files. Even on a single-user personal computer, it is inconvenient. It is quite common for users to want to group their files together in logical ways. A professor for example, might have a collection of files that together form a book that he is writing for one course, a second collection of files containing student programs submitted for another course, a third group of files containing the code of an advanced compiler-writing system he is building, a fourth group of files containing grant proposals, as well as other files for electronic mail, minutes of meetings, papers he is writing, games, and so on. Some way is needed to group these files together in flexible ways chosen by the user.
What is needed is a general hierarchy (i.e., a tree of directories). With this approach, each user can have as many directories as are needed so that files can be grouped together in natural ways. This approach is shown in Fig. 6-9. Here, the directories A , B , and C contained in the root directory each belong to a different user, two of whom have created sub-directories for projects they are working on.

 A hierarchical directory system. The ability for users to create an arbitrary number of sub-directories provides a powerful structuring tool for users to organize their work. For this reason, nearly all modern file systems are organized in this manner.

5.3ANTIVIRUS AND ANTI-ANTIVIRUS TECHNIQUES
Viruses try to hide and users try to find them, which leads to a cat-and-mouse game. Let us now look at some of the issues here. To avoid showing up in directory listings, a companion virus, source code virus, or other file that should not be there can turn on the HIDDEN bit in Windows or use a file name beginning with the . character in UNIX. More sophisticated is to modify Windows’ explorer or UNIX’ ls to refrain from listing files whose names begin with Virgil - . Viruses can also hide in unusual and unsuspected places, such as the bad sector list on the disk or the Windows registry (an in-memory database available for programs to store uninterpreted strings). The flash ROM used to hold the BIOS and the CMOS memory are also possibilities although the former is hard to write and the latter is quite small. And, of course, the main workhorse of the virus world is infecting executable files and documents on the hard disk.
Virus Scanners
Clearly, the average garden-variety user is not going to find many viruses that do their best to hide, so a market has developed for antivirus software. Below we will discuss how this software works. Antivirus software companies have laboratories in which dedicated scientists work long hours tracking down and understanding new viruses. The first step is to have the virus infect a program that does nothing, often called a goat file , to get a copy of the virus in its purest form. The next step is to make an exact listing of the virus’ code and enter it into the database of known viruses. Companies compete on the size of their databases. Inventing new viruses just to pump up your database is not considered sporting.
Once an antivirus program is installed on a customer’s machine, the first thing it does is scan every executable file on the disk looking for any of the viruses in the database of known viruses. Most antivirus companies have a Web site from which customers can download the descriptions of newly-discovered viruses into their databases. If the user has 10,000 files and the database has 10,000 viruses, some clever programming is needed to make it go fast, of course.
Since minor variants of known viruses pop up all the time, a fuzzy search is needed,  o a 3-byte change to a virus does not let it escape detection. However, fuzzy searches are not only slower than exact searches, but they may turn up false alarms, that is, warnings about legitimate files that happen to contain some code vaguely similar to a virus reported in Pakistan 7 years ago. What is the user supposed to do with the message:
WARNING! File xyz.exe may contain the lahore-9x virus. Delete?
The more viruses in the database and the broader the criteria for declaring a hit, the more false alarms there will be. If there are too many, the user will give up in disgust. But if the virus scanner insists on a very close match, it may miss some modified viruses.
Getting it right is a delicate heuristic balance. Ideally, the lab should try to identify some core code in the virus that is not likely to change and use this as the virus signature to scan for.
Just because the disk was decaled virus free last week does not mean that it still is, so the virus scanner has to be run frequently. Because scanning is slow, it is more efficient to check only those files that have been changed since the date of the last scan. The trouble is, a clever virus will reset the date of an infected file to its original date to avoid detection. The antivirus program’s response to that is to check the date the enclosing directory was last changed. The virus’ response to that is to reset the directory’s date as well. This is the start of the cat-and-mouse game alluded to above.
Another way for the antivirus program to detect file infection is to record and store on the disk the lengths of all files. If a file has grown since the last check, it might be infected, as shown in Fig. 9-16(a-b). However, a clever virus can avoid detection by compressing the program and padding out the file to its original length. To make this scheme work, the virus must contain both compression and decompression procedures, as shown in Fig. 9-16(c).
Another way for the virus to try to escape detection is to make sure its representation on the disk does not look at all like its representation in the antivirus software’s database. One way to achieve this goal is to encrypt itself with a different key for each file infected. Before making a new copy, the virus generates a random 32-bit encryption key, for example by XORing the current time with the contents of, say, memory words 72,008 and 319,992. It then XORs its code with this key, word by word to produce the encrypted virus stored in the infected file, The key is stored in the file. For secrecy purposes, putting the key in the file is not ideal, but the goal here is to foil the virus scanner, not prevent the dedicated scientists at the antivirus lab from reverse engineering the code. Of course, to run, the virus has to first decrypt itself, so it needs a decrypting procedure in the file as well.
This scheme is still not perfect because the compression, decompression, encryption, and decryption procedures are the same in all copies, so the antivirus program can just use them as the virus signature to scan for. Hiding the compression, decompression, and encryption procedures is easy: they are just encrypted along with the rest of the virus, as shown in Fig. 9-16(e). The decryption code cannot be encrypted, however. It has to actually execute on the hardware to decrypt the rest of the virus so it must be present in plaintext form. Antivirus programs know this, so they hunt for the decryption procedure.
However, Virgil enjoys having the last word, so he proceeds as follows. Suppose that the decryption procedure needs to perform the calculation X = (A + B + C – 4) The straightforward assembly code for this calculation for a generic two-address computer is shown in Fig. 9-17(a). The first address is the source; the second is the destination, so MOV A,R1 moves the variable A to the register R1 .
only less efficiently due to the NOP (no operation) instructions interspersed with the real code.
MOV                 A       ,R1
ADD                  B,      R1
ADD                  C,      R1
SUB                   #4,    R1
MOV                 R1,    X
MOV                 A,     R1
NOP
ADD                  B,      R1
NOP
ADD                  C,     R1
NOP
SUB                   #4,    R1
NOP
MOV                 R1,    X
MOV                 A,     R1
ADD                  #0,     R1
ADD                  B,      R1
OR           R1,    R1
ADD                  C,     R1
SHL                   #0,    R1
SUB                   #4,     R1
JMP                   .+1
MOV                 R1,   X
MOV                 A,      R1
OR           R1,    R1
ADD                  B,      R1
MOV                 R1,    R5
ADD                  C,      R1
SHL                   R1     ,0
SUB                   #4,     R1
ADD                  R5,    R5
MOV                 R1,   X
MOV                 R5,   Y
MOV                 A,     R1
TST                   R1
ADD                  C,      R1
MOV                 R1,    R5
ADD                  B,     R1
CMP                  R2,    R5
SUB                   #4,     R1
JMP                   .+1
MOV                 R1,   X
MOV                 R5,    Y
But we are not done yet. It is also possible to disguise the decryption code. There are many ways to represent NOP . For example, adding 0 to a register, ORing it with itself, shifting it left 0 bits, and jumping to the next instruction all do nothing.         
when executed. A virus that mutates on each copy is called a polymorphic virus .Now suppose that R5 is not needed during this piece of the code.. Finally, in many cases it is possible to swap instructions without changing what the program does as another code fragment that is logically equivalent A piece of code that can mutate a sequence of machine instructions without changing its functionally is called a mutation engine , and sophisticated viruses contain them to mutate the decryptor from copy to copy. The mutation engine itself can be hidden by encrypting it along with the body of the virus.
Asking the poor antivirus software to realize are all functionally equivalent is asking a lot, especially if the mutation engine has many tricks up its sleeve. The antivirus software can analyze the code to see what it does, and it can even try to simulate the operation of the code, but remember it may have thousands of viruses and thousands of files to analyze so it does not have much time per test or it will
run horribly slowly.
As an aside, the store into the variable Y was thrown in just to make it harder to detect the fact that the code related to R5 is dead code, that is, does not do anything. If other code fragments read and write Y , the code will look perfectly legitimate. A well-written mutation engine that generates good polymorphic code can give antivirus software writers nightmares. The only bright side is that such an engine is hard to write, so Virgil’s friends all use his code, which means there are not so many different ones in circulation—yet.
So far we have talked about just trying to recognize viruses in infected executable files. In addition, the antivirus scanner has to check the MBR, boot sectors, bad sector list, flash ROM, CMOS memory, etc but what if there is a memory-resident virus currently running? That will not be detected. Worse yet, suppose the running virus is monitoring all system calls. It can easily detect that the antivirus program is reading the boot sector (to check for viruses). To thwart the antivirus program, the virus does not make the system call. Instead it just returns the true boot sector from its hiding place in the bad block list. It also makes a mental note to reinfect all the files when the virus scanner is finished.
To prevent being spoofed by a virus, the antivirus program could make hard reads to the disk, bypassing the operating system. However this requires having built-in device drivers for IDE, SCSI, and other common disks, making the antivirus program less portable and subject to failure on computers with unusual disks. Furthermore, since bypassing the operating system to read the boot sector is possible, but bypassing it to read all the executable files is not, there is also some danger that the virus can produce fraudulent data about executable files as well.
Integrity Checkers
A completely different approach to virus detection is integrity checking . An antivirus program that works this way first scans the hard disk for viruses. Once it is convinced that the disk is clean, it computes a checksum for each executable file and writes the list of checksums for all the relevant files in a directory to a file, checksum , in that directory.
The next time it runs, it recomputes all the checksums and sees if they match what is in the file checksum . An infected file will show up immediately.
The trouble is Virgil is not going to take this lying down. He can write a virus that removes the checksum file. Worse yet, he can write a virus that computes the checksum of the infected file and replaces the old entry in the checksum file. To protect against this kind of behavior, the antivirus program can try to hide the checksum file, but that is not likely to work since Virgil can study the antivirus program carefully before writing the virus. A better idea is to encrypt it to make tampering easier to detect. Ideally, the encryption should involve use of a smart card with an externally stored key that programs cannot get at.
Behavioral Checkers
A third strategy used by antivirus software is behavioral checking . With this approach, the antivirus program lives in memory while the computer is running and catches all system calls itself. The idea is that it can then monitor all activity and try to catch anything that looks suspicious. For example, no normal program should attempt to overwrite the boot sector, so an attempt to do so is almost certainly due to a virus.
Likewise, changing the flash ROM is highly suspicious.
But there are also cases that are less clear cut. For example, overwriting an executable file is a peculiar thing to do—unless you are a compiler. If the antivirus software detects such a write and issues a warning, hopefully the user knows whether overwriting an executable makes sense in the context of the current work. Similarly, Word overwriting a .doc file with a new document full of macros is not necessarily the work of a virus. In Windows, programs can detach from their executable file and go memory resident using a special system call. Again, this might be legitimate, but a warning might still be useful.
Viruses do not have to passively lie around waiting for an antivirus program to kill them, like cattle being led off to slaughter. They can fight back. A particularly interesting battle can occur if a memory-resident virus and a memory-resident antivirus meet up on the same computer. Years ago there was a game called Core Wars in which two programmers faced off by each dropping a program into an empty address space. The programs took turns probing memory, with the object of the game being to locate and wipe out your opponent before he wiped you out. The virus-antivirus confrontation looks a little like that, only the battlefield is the machine of some poor user who does not really want it to happen there. Worse yet, the virus has an advantage because its writer can find out a lot about the antivirus program by just buying a copy of it. Of course, once the virus is out there, the antivirus team can modify their program, forcing Virgil to go buy a new copy.
Virus Avoidance
Every good story needs a moral. The moral of this one is
Better safe than sorry. Avoiding viruses in the first place is a lot easier than trying to track them down once they have infected a computer. Below are a few guidelines for individual users, but also some  things that the industry as a whole can do to reduce the problem considerably.
What can users do to avoid a virus infection? First, choose an operating system that offers a high degree of security, with a strong kernel-user mode boundary and separate login passwords for each user and the system administrator. Under these conditions, avirus that somehow sneaks in cannot infect the system binaries.
Second, install only shrink-wrapped software bought from a reliable manufacturer. Even this is no guarantee since there have been cases where disgruntled employees have slipped viruses onto a commercial software product, but it helps a lot. Downloading software from Web sites and bulletin boards is risky behavior.
Third, buy a good antivirus software package and use it as directed. Be sure to get regular updates from the manufacturer’s Web site.
Fourth, do not click on attachments to email and tell people not to send them to you. Email sent as plain ASCII text is always safe but attachments can start viruses when opened.
Fifth, make frequent backups of key files onto an external medium, such as floppy disk, CD-recordable, or tape. Keep several generations of each file on a series of backup media. That way, if you discover a virus, you may have a chance to restore files as they were before they were infected. Restoring yesterday’s infected file does not help, but restoring last week’s version might.
The industry should also take the virus threat seriously and change some dangerous practices. First, make simple operating systems. The more bells and whistles there are, the more security holes there are. That is a fact of life.
Second, forget active content. From a security point of view, it is a disaster.  Viewing a document someone sends you should not require your running their program. JPEG files, for example, do not contain programs, and thus cannot contain viruses. All documents should work like that.
Third, there should be a way to selectively write protect specified disk cylinders to prevent viruses from infecting the programs on them. This protection could be implemented by having a bitmap inside the controller listing the write protected cylinders. The map should only be alterable when the user has flipped a mechanical toggle switch on the computer’s front panel.
Fourth, flash ROM is a nice idea, but it should only be modifiable when an external  toggle switch has been flipped, something that will only happen when the user is consciously installing a BIOS update. Of course, none of this will be taken seriously until a really big virus hits. For example, one that hit the financial world and reset all bank accounts to 0. Of course, by then it would be too late.
Recovery from a Virus Attack
When a virus is detected, the computer should be halted immediately since a memoryresident virus may still be running. The computer should be rebooted from a CD-ROM or floppy disk that has always been write protected, and which contains the full operating system to bypass the boot sector, hard disk copy of the operating system, and disk drivers, all of which may now be infected. Then an antivirus program should be run from its original CD-ROM, since the hard disk version may also be infected.
The antivirus program may detect some viruses and may even be able to eliminate them, but there is no guarantee that it will get them all. Probably the safest course of action at this point is to save all files that cannot contain viruses (like ASCII and JPEG files).
Those files that might contain viruses (like Word files) should be converted to another format that cannot contain viruses, such as that ASCII text (or at least the macros should be removed). All the saved files should be saved on an external medium. Then the hard disk should be reformatted using a format program taken from a write-protected floppy disk or a CD-ROM to insure that it itself is not infected. It is especially important that the MBR and boot sectors are also fully erased. Then the operating system should be reinstalled from the original CD-ROM. When dealing with virus infections, paranoia is your best friend.

5.4 BASIC OF  CRYPTOGRAPHY CONCEPTS

This topic provides a basic understanding of cryptographic function and an overview of the cryptographic services for the systems running the i5/OS® operating system.

Cryptography

Cryptographic services help ensure data privacy, maintain data integrity, authenticate communicating parties, and prevent repudiation (when a party refutes having sent a message).
Basic encryption allows you to store information or to communicate with other parties while preventing non-involved parties from understanding the stored information or understanding the communication. Encryption transforms understandable text (plaintext) into an unintelligible piece of data (ciphertext). Decryption restores the understandable text from the unintelligible data. Both functions involve a mathematical formula (the algorithm) and secret data (the key).

Cryptographic algorithms

There are two types of cryptographic algorithms:
1.     With a secret or symmetric key algorithm, the key is a shared secret between two communicating parties. Encryption and decryption both use the same key. The Data Encryption Standard (DES) and the Advanced Encryption Standard (AES) are examples of symmetric key algorithms.
There are two types of symmetric key algorithms:
Block ciphers
In a block cipher, the actual encryption code works on a fixed-size block of data. Normally, the user's interface to the encrypt/decrypt operation will handle data longer than the block size by repeatedly calling the low-level encryption function. If the length of data is not on a block size boundary, it must be padded.
Stream ciphers
Stream ciphers do not work on a block basis, but convert 1 bit (or 1 byte) of data at a time.
2.     With a public key (PKA) or asymmetric key algorithm, a pair of keys is used. One of the keys, the private key, is kept secret and not shared with anyone. The other key, the public key, is not secret and can be shared with anyone. When data is encrypted by one of the keys, it can only be decrypted and recovered by using the other key. The two keys are mathematically related, but it is virtually impossible to derive the private key from the public key. The RSA algorithm is an example of a public key algorithm.
Public key algorithms are slower than symmetric key algorithms. Applications typically use public key algorithms to encrypt symmetric keys (for key distribution) and to encrypt hashes (in digital signature generation).
Together, the key and the cryptographic algorithm transform the data. All of the supported algorithms are in the public domain. Therefore it is the key that controls access to the data. You must safeguard the keys to protect the data.

Cryptographic operations

Different cryptographic operations may use one or more algorithms. You choose the cryptographic operation and algorithm(s) depending on your purpose. For example, for the purpose of ensuring data integrity, you might want to use a MAC (message authentication code) operation with the AES algorithm.
The system provides several API sets that support cryptographic operations. See the System cryptography overview information at the bottom of this topic for more information.

Data privacy

Cryptographic operations for the purpose of data privacy (confidentiality) prevent an unauthorized person from reading a message. The following operations are included in data privacy:
Encrypt and Decrypt
The encrypt operation changes plaintext data into ciphertext through the use of a cipher algorithm and key. To restore the plaintext data, the decrypt operation must employ the same algorithm and key.
Encryption and decryption may be employed at any level of the operating system. There are three levels:
Field level encryption With field level encryption, the user application explicitly requests cryptographic services. The user application completely controls key generation, selection, distribution, and what data to encrypt. Session level encryption
With encryption at the session layer, the system requests cryptographic services instead of an application. The application may or may not be aware that encryption is happening.
Link level encryption
Link level encryption is performed at the lowest level of the protocol stack, usually by specialized hardware.
The Cryptographic Coprocessors and the 2058 Cryptographic Accelerator may be used for both field level encryption and Secure Sockets Layer (SSL) session establishment encryption. While VPN is supported in i5/OS, it does not use either coprocessor or the accelerator. Furthermore, the system does not support SNA session level encryption at all.
Translate
The translate operation decrypts data from encryption under one key and encrypts the data under another key. This is done in one step to avoid exposing the plaintext data within the application program.

Data integrity, authenticity, and non-repudiation

Encrypted data does not mean the data can not be manipulated (e.g. repeated, deleted, or even altered). To rely on data, you need to know that it comes from an authorized source and is unchanged. Additional cryptographic operations are required for these purposes.

Hash (Message Digest)
A cryptographic hash operation produces a fixed-length output string (often called a digest) from a variable-length input string. For all practical purposes, the following statements are true of a good hash function:
Collision resistant: If any portion of the data is modified, a different hash will be generated.
One-way: The function is irreversible. That is, given a digest, it is not possible to find the data that produces it.
These properties make hash operations useful for authentication purposes. For example, you can keep a copy of a digest for the purpose of comparing it with a newly generated digest at a later date. If the digests are identical, the data has not been altered.
MAC (Message Authentication Code)
A MAC operation uses a secret key and cipher algorithm to produce a value (the MAC) which later can be used to ensure the data has not been modified. Typically, a MAC is appended to the end of a transmitted message. The receiver of the message uses the same MAC key, and algorithm as the sender to reproduce the MAC. If the receiver's MAC matches the MAC sent with the message, the data has not been altered.
The MAC operation helps authenticate messages, but does not prevent unauthorized reading because the transmitted data remains as plaintext. You must use the MAC operation and then encrypt the entire message to ensure both data privacy and integrity.
HMAC (Hash MAC)
An HMAC operation uses a cryptographic hash function and a secret shared key to produce an authentication value. It is used in the same way a MAC is used.
Sign/Verify
A sign operation produces an authentication value called a digital signature. A sign operation works as follows:
1. The data to be signed is hashed, to produce a digest.
2. The digest is encrypted using a PKA algorithm and a private key, to produce the signature.
3. The verify operation works as follows:
4. The signature is decrypted using the sender's PKA public key, to produce digest 1.
5. The data that was signed is hashed, to produce digest 2.
6. If the two digests are equal, the signature is valid.
Theoretically, this also verifies the sender because only the sender should posses the private key. However, how can the receiver verify that the public key actually belongs to the sender? Certificates are used to help solve this problem.

Key and random number generation

Many security-related functions rely on random number generation, for example, salting a password or generating an initialization vector. An important use of random numbers is in the generation of cryptographic key material. Key generation has been described as the most sensitive of all computer security functions. If the random numbers are not cryptographically strong, the function will be subject to attack.
The i5/OS operating system contains a pseudorandom number generator (PRNG). The PRNG is used by many system functions and is available for application use through the Cryptographic Services API set.
The PRNG is composed of two parts: pseudorandom number generation and seed management. Pseudorandom number generation is performed using the FIPS 186-1 algorithm. Cryptographically strong pseudorandom numbers rely on good seed. The FIPS 186-1 key and seed values are obtained from a system seed digest. The system automatically generates seed using data collected from system information or by using the random number generator function on a cryptographic coprocessor if one is available. System-generated seed can never be truly unpredictable. If a cryptographic coprocessor is not available, you should add your own random seed to the system seed digest. This should be done as soon as possible any time the Licensed Internal Code is installed.

Key management

Key management is the secure handling and storage of cryptographic keys. This includes key storage and retrieval, key encryption and conversions, and key distribution.
Key storage
Key storage on the system includes the following:
·         Cryptographic Services key store
·         Digital certificate manager certificate store
·         CCA key store (used with the Cryptographic Coprocessors)
·         JCE key store
In addition, keys can also be stored on the Cryptographic Coprocessors themselves.
Key Encryption and Conversions
Keys must be encrypted prior to sending or storing them outside the secured system environment. In addition, keys should be handled in encrypted form within the system as much as possible to reduce the risk of exposure. The management of encrypted keys is often done via a hierarchical key system.
·         At the top is a master key (or keys). The master key is the only clear key value and must be stored in a secure fashion.
·         Key-encrypting keys (KEKs) are used to encrypt other keys. Typically, a KEK is used to encrypt a stored key, or a key that is sent to another system. KEKs are normally encrypted under a master key.
·         Data keys are keys used directly on user data (such as to encrypt or MAC). A data key may be encrypted under a KEK or under a master key.
Various uses of a key will require the key to be in different forms. For example, keys received from other sources will normally be converted to an internal format. Likewise, keys sent out of the system are converted to a standard external format before sending. Certain key forms are standard, such as an ASN.1 BER-encoded form, and others are peculiar to a cryptographic service provider, such as the Cryptographic Coprocessors.
Key Distribution
Typically, data encryption is performed using symmetric key algorithms. The symmetric keys are distributed using asymmetric key algorithms. Consider these examples:
·         RSA - An RSA public key is used to encrypt a symmetric key which is then distributed. The corresponding private key is used to decrypt it.
·         Diffie-Hellman - The communicating parties generate and exchange D-H parameters which are then used to generate key pairs. The public keys are exchanged and each party is then able to compute the symmetric key independently.
5.5 SECURITY
Many companies possess valuable information that they guard closely. This information can be technical (e.g., a new chip design or software), commercial (e.g., studies of the competition or marketing plans), financial (e.g., plans for a stock offering), legal (e.g., documents about a potential merger or takeover), among many other possibilities. Frequently this information is protected by having a uniformed guard at the building entrance who checks to see that all people entering the building are wearing a proper badge. In addition, many offices may be locked and some file cabinets may be locked as well to ensure that only authorized people have access to the information.
As more and more of this information is stored in computer systems, the need to protect it is becoming increasingly important Protecting this information against unauthorized usage is therefore a major concern of all operating systems. Unfortunately, it is also becoming increasingly difficult due to the widespread acceptance of system bloat as being a normal and acceptable phenomenon. In the following sections we will look at a variety of issues concerned with security and protection, some of which have analogies to real-world protection of information on paper, but some of which are unique to computer systems. In this chapter we will examine computer security as it applies to operating systems.
5.6 THE SECURITY ENVIRONMENT
Some people use the terms “security” and “protection” interchangeably. Nevertheless, it is frequently useful to make a distinction between the general problems involved in making sure that files are not read or modified by unauthorized persons, which include technical, administrative, legal, and political issues on the one hand, and the specific operating system mechanisms used to provide security, on the other. To avoid confusion, we will use the term security to refer to the overall problem, and the term protection mechanisms to refer to the specific operating system mechanisms used to safeguard information in the computer. The boundary between them is not well defined, however. First we will look at security to see what the nature of the problem is. Later on in the chapter we will look at the protection mechanisms and models available to help achieve security.
Security has many facets. Three of the more important ones are the nature of the threats, the nature of intruders, and accidental data loss. We will now look at these in turn.
5.6.1 Threats
From a security perspective, computer systems have three general goals, with corresponding threats to them, as listed in Fig. 9-1. The first one, data confidentiality , is concerned with having secret data remain secret. More specifically, if the owner of some data has decided that these data are only to be made available to certain people and no others, the system should guarantee that release of the data to unauthorized people does not occur. As a bare minimum, the owner should be able to specify who can see what, and the system should enforce these specifications.

The second goal, data integrity , means that unauthorized users should not be able to modify any data without the owner’s permission. Data modification in this context includes not only changing the data, but also removing data and adding false data as well. If a system cannot guarantee that data deposited in it remain unchanged until the owner decides to change them, it is not worth much as an information system.
The third goal, system availability , means that nobody can disturb the system to make it unusable. Such denial of service attacks are increasingly common. For example, if a computer is an Internet server, sending a flood of requests to it may cripple it by eating up all of its CPU time just examining and discarding incoming requests. If it takes, say, 100 sec to process an incoming request to read a Web page, then anyone who manages to send 10,000 requests/sec can wipe it out. Reasonable models and technology for dealing with attacks on confidentiality and integrity are available; foiling denial-of-services attacks is much harder.
Another aspect of the security problem is privacy : protecting individuals from misuse of information about them. This quickly gets into many legal and moral issues. Should the government compile dossiers on everyone in order to catch X-cheaters, where X is “welfare” or “tax,” depending on your politics? Should the police be able to look up anything on anyone in order to stop organized crime? Do employers and insurance companies have rights? What happens when these rights conflict with individual rights? All of these issues are extremely important but are beyond the scope of this book.
5.6.2 Intruders
Most people are pretty nice and obey the law, so why worry about security? Because there are unfortunately a few people around who are not so nice and want to cause trouble (possibly for their own commercial gain). In the security literature, people who are nosing around places where they have no business being are called intruders or sometimes adversaries . Intruders act in two different ways. Passive intruders just want to read files they are not authorized to read. Active intruders are more malicious; they want to make unauthorized changes to data. When designing a system to be secure against intruders, it is important to keep in mind the kind of intruder one is trying to protect against. Some common categories are
1             Casual prying by nontechnical users. Many people have personal computers on their desks that are connected to a shared file server, and human nature being what it is, some of them will read other people’s electronic mail and other files if no barriers are placed in the way. Most UNIX systems, for example, have the default that all newly created files are publicly readable.
2             Snooping by insiders. Students, system programmers, operators, and other technical personnel often consider it to be a personal challenge to break the security of the local computer system. They often are highly skilled and are willing to devote a substantial amount of time to the effort.
                  3.       Determined attempts to make money. Some bank programmers have attempted to steal from the bank they were working for. Schemes have
                  varied from changing the software to truncate rather than round interest, keeping the fraction of a cent for themselves, to siphoning off accounts not used in years, to blackmail (“Pay me or I will destroy all the bank’s records.”).
3             Commercial or military espionage. Espionage refers to a serious and well-funded attempt by a competitor or a foreign country to steal programs, trade secrets, patentable ideas, technology, circuit designs, business plans, and so forth. Often this attempt will involve wiretapping or even erecting antennas directed at the computer to pick up its electromagnetic radiation.

It should be clear that trying to keep a hostile foreign government from stealing military secrets is quite a different matter from trying to keep students from inserting a funny message-of-the-day into the system. The amount of effort needed security and protection clearly depends on who the enemy is thought to be.
Another category of security pest that has manifested itself in recent years is the virus, which will be discussed at length below. Basically a virus is a piece of code that replicates itself and (usually) does some damage. In a sense, the writer of a virus is also an intruder, often with high technical skills. The difference between a conventional intruder and a virus is that the former refers to a person who is personally trying to break into a system to cause damage whereas the latter is a program written by such a person and then released into the world hoping it causes damage. Intruders try to break into specific systems (e.g., one belonging to some bank or the Pentagon) to steal or destroy particular data, whereas a virus usually causes more general damage. In a sense, an intruder is like someone with a gun who tries to kill a specific person; a virus writer is more like a terrorist bomber who just wants to kill people in general, rather than some particular person.
5.6.3 Accidental Data Loss
In addition to threats caused by malicious intruders, valuable data can be lost by accident. Some of the common causes of accidental data loss are
Acts of God: fires, floods, earthquakes, wars, riots, or rats gnawing tapes or floppy disks.
Hardware or software errors: CPU malfunctions, unreadable disks or tapes telecommunication errors, program bugs.
Human errors: incorrect data entry, wrong tape or disk mounted, wrong program run, lost disk or tape, or some other mistake.

Most of these can be dealt with by maintaining adequate backups, preferably far away from the original data. While protecting data against accidental loss may seem mundane compared to protecting against clever intruders, in practice, probably more damage is caused by the former than the latter.


5.7 ATTACKS FROM INSIDE THE SYSTEM
Once a cracker has logged into a computer, he can start doing damage. If the computer has good security, it may only be possible to harm the user whose account has been broken, but often this initial entry can be leveraged to break into more accounts later. In the following sections, we will look at some attacks that can be set up by someone already logged in, either a cracker who has gotten in illicitly or possibly a legitimate user with a grudge against someone.
5.7.1 Trojan Horses
One hoary insider attack is the Trojan horse , in which a seemingly innocent program contains code to perform an unexpected and undesirable function. This function might be modifying, deleting or encrypting the user’s files, copying them to a place where the cracker can retrieve them later, or even sending them to the cracker or a temporary safe hiding place via email or FTP. To have the Trojan horse run, the person planting it first has to get the program carrying it executed. One way is to place the program on the Internet as a free, exciting new game, MP3 viewer, “special” porno viewer, or something else likely to attract attention, and encourage people to download it. When it runs, the Trojan horse procedure is called and can do anything the user can do (e.g., delete files, open network connections, etc.). Note that this ploy does not require the author of the Trojan horse to break into the victim’s computer.
There are other ways to trick the victim into executing the Trojan horse program as well. For example, many UNIX users have an environment variable, $PATH , which controls which directories are searched for a command. It can be viewed by typing the following command to the shell:
echo $PATH
A potential setting for the user ast on a particular system might consist of the following directories:
:/usr/ast/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/ucb:/usr/man\
:/usr/java/bin:/usr/java/lib:/usr/local/man:/usr/openwin/man
Other users are likely to have a different search path. When the user types
prog
to the shell, the shell first takes a look to see if there is a program named /usr/ast/bin/prog . If there is, it is executed. If it is not there, the shell tries /usr/local/bin/prog , /usr/bin/prog , /bin/prog , and so on, trying all 10 directories in turn before giving up. Suppose that just one of these directories was left unprotected so a cracker could put a program there. If this is the first occurrence of the program in the list, it will be executed and the Trojan horse will run.
Most common programs are in /bin or /usr/bin , so putting a Trojan horse in /usr/bin/X11/ls does not work for a common program because the real one will be found first. However, suppose the cracker inserts la into /usr/bin/X11 . If a user mistypes la instead of ls (the directory listing program), now the Trojan horse will run, do its dirty work, and then issue the correct message that la does not exist. By inserting Trojan horses into complicated directories that hardly anyone ever looks at and giving them names that could represent common typing errors, there is a fair chance that someone will invoke one of them sooner or later. And that someone might be the superuser (even superusers make typing errors), in which case the Trojan horse now has the opportunity to replace /b in/ls with a version containing a Trojan horse, so it will be invoked all the time now.
A malicious but legal user, Mal, could also lay a trap for the superuser as follows. He puts a version of ls containing a Trojan horse in his own directory and then does something suspicious that is sure to attract the superuser’s attention, such as starting up 100 compute-bound processes at once. Chances are the superuser will check that out by typing
cd /usr/mal
ls -l

to see what Mal has in his home directory. Since some shells try the local directory before working through $PATH , the superuser may have just invoked Mal’s Trojan horse with superuser power. The Trojan horse could make /usr/mal/bin/sh SETUID root. All it takes is two system calls: chown to change the owner of /usr/mal/bin/sh to root and chmod , to set its SETUID bit. Now Mal can become superuser at will by just running that shell.
If Mal finds himself frequently short of cash, he might use one of the following Trojan horse scams to help his liquidity position. In the first one, the Trojan horse checks to see if the victim has an online banking program, such as Quicken , installed. If so, the Trojan horse directs the program to transfer some money from the victim’s account to a dummy account (preferably in a far-away country) for collection in cash later.
In the second scam, the Trojan horse first turns off the modem’s sound, then dials a 900 (pay) number, again, preferably in a far-away country, such as Moldova (part of the former Soviet Union). If the user was online when the Trojan horse was started, then the 900 phone number in Moldova needs to be a (very expensive) Internet provider, so the user will not notice and perhaps stay online for hours. Neither of these techniques is hypothetical; both have happened and are reported in (Denning, 1999). In the latter one, 800,000 minutes of connect time to Moldova were run up before the U.S. Federal Trade Commission managed to get the plug pulled and filed suit against three people on Long Island. They eventually agreed to return $2.74 million to 38,000 victims.
5.7.2 Login Spoofing
Somewhat related to Trojan horses is login spoofing . It works as follows. Normally, when no one is logged in on a UNIX terminal or workstation on a LAN, a screen such as Fig. 9-9(a) is displayed. When a user sits down and types a login name, the system asks for a password. If it is correct, the user is logged in and a shell is started.
Now consider this scenario. Mal writes a program to display the screen of . It looks amazingly like the screen , except that this is not the system login program running, but a phony one written by Mal. Mal now walks away to watch the fun from a safe distance. When a user sits down and types a login name, the program responds by asking for a password and disabling echoing. After the login name and password have been collected, they are written away to a file and the phony login program sends a signal to kill its shell. This action logs Mal out and triggers the real login program to start and display the prompt of Fig. 9-9(a). The user assumes that she made a typing error and just logs in again. This time it works. But in the meantime, Mal has acquired another (login name, password) pair. By logging in at many terminals and starting the login spoofer on all of them, he can collect many passwords.
The only real way to guard against this is to have the login sequence start with a key combination that user programs cannot catch. Windows 2000 uses CTRL-ALT-DEL for this purpose. If a user sits down at a terminal and starts out by typing CTRL-ALT-DEL, the current user is logged out and the system login program is started. There is no way to bypass this mechanism.

5.12 ATTACKS FROM OUTSIDE THE SYSTEM
The threats discussed in the previous sections were largely caused from the inside, that is, perpetrated by users already logged in. However, for machines connected to the Internet or another network, there is a growing external threat. A networked computer can be attacked from a distant computer over the network. In nearly all cases, such an attack consists of some code being transmitted over the network to the target machine and executed there doing damage. As more and more computers join the Internet, the potential for damage keeps growing. In the following sections we will look at some of the operating systems aspects of these external threats, primarily focusing on viruses, worms, mobile code, and Java applets.
It is hard to open a newspaper these days without reading about another computer virus or worm attacking the world’s computers. They are clearly a major security problem for individuals and companies alike. In the following sections we will examine how they work and what can be done about them.
I was somewhat hesitant to write this section in so much detail, lest it give some people bad ideas, but existing books give far more detail and even include real code (e.g., Ludwig, 1998). Also the Internet is full of information about viruses so the genie is already out of the bottle. In addition, it is hard for people to defend themselves against viruses if they do not know how they work. Finally, there are a lot of misconceptions about viruses floating around that need correction.
Unlike, say, game programmers, successful virus writers tend not to seek publicity after their products have made their debut. Based on the scanty evidence there is, it appears that most are high school or college students or recent graduates who wrote the virus as a technical challenge, not realizing (or caring) that a virus attack can cost the collective victims as much as a hurricane or earthquake. Let us call our antihero Virgil the virus writer. If Virgil is typical, his goals are to produce a virus that spreads quickly, is difficult to detect, and is hard to get rid of once detected.
What is a virus, anyway? To make a long story short, a virus is a program that can reproduce itself by attaching its code to another program, analogous to how biological viruses reproduce. In addition, the virus can also do other things in addition to reproducing itself. Worms are like viruses but are self replicating. That difference will not concern us here, so we will use the term “virus” to cover both for the moment. 

No comments:

Post a Comment