DIGITAL EVIDENCE RECOVERY BASICS
A hard drive, sometimes referred to as the “C:” drive, contains several hard round platters coated on both sides with a magnetic material designed to store information as binary numbers, magnetic patterns of 0’s and 1’s. The platters are mounted on a spindle that rotates at high speed, generally 5,000 to 10,000 rpm.
Electromagnetic read/write devices, known as ‘heads,’ mounted on sliders and connected to an actuator arm, are positioned over the surface of the disk. A logic board controls the motion of the heads, the process for reading and writing data, and the protocol for communicating with the rest of the computer.
Think of the inside of a hard drive which is similar to the inside of a jukebox, with the record being the platter, the jukebox tone arm being the hard drive’s actuator arm, and the needle being the read/write heads.
How Data is Stored
The surface of each platter can hold tens of billions of individual ‘bits’ of data. Groups of bits, of either a 0 or 1, could be eight, sixteen, or thirty-two bits in length. These bits form a ‘byte,’ representing an alphabetical character or numerical number. Most desktop and server hard drives are 3.5 inches in diameter and notebook PCs have 2.5-inch and 1.8 inch drives. A large capacity drive, measured in gigabytes (GB), can store hundreds of billions of individual bits of data and is commonly available for under $200.
Each platter has two surfaces capable of holding data (the top and bottom), and each surface has a read/write head. Thus on a hard drive with three platters, there are a total of six ‘surfaces’ with information being read by six heads.
The recording surface of each platter is divided into concentric tracks (circles), and the vertical area of similar tracks on multiple platters is referred to as a “cylinder.” These tracks are further subdivided into sectors and clusters, which are groups of sectors. The logical organization of information on the platter is similar to slices of a pie. Data is stored in all sectors of each track, except parts of the outside track, which is generally reserved for the file allocation table (FAT) directory. The FAT contains the file names and the locations of active files on the disk. The file allocation table tells the computer’s operating system which sectors (the “geographic location”) contain data. A sector typically will hold 512 bytes of data (about the length of this sentence), plus “address” information used by the drive controller circuitry. There can be over 40 million sectors on a 20 GB hard drive. “Formatting” a hard drive is the process by which the disk surface is organized into tracks and sectors.
Sectors are also grouped sequentially into clusters, and generally there are 32 sectors per cluster. More often than not, data is stored sequentially in sectors within the clusters.
Reading and Writing Digital Data
When a user clicks on a file to open it, the application being used passes the file name to the computer operating system, which consults the FAT to determine the address (platter track and sector) where the first portion of the file is located. The operating system transmits this information to the disk controller, which positions the heads on the actuator arm over the correct physical location. The initial cluster will contain the address of subsequent sectors from which the controller must retrieve data. The controller retrieves the packets of data and reassembles them in the correct order before sending the ‘file’ to the central processing unit (CPU) for display on the screen.
Disk systems, unlike tape, do not store records together physically. With tape, each time a change is made to a block of data, such as an insertion in a text file, the entire block of “data” is rewritten onto the tape with the new data incorporated. When a similar change is made to text stored on disk, the original file usually remains intact. The disk-controller checks the file allocation table for the location of an unallocated cluster (a group of sectors available to store data), and inserts the data there.
Thus the various parts of a file, such as this article, can be scattered randomly among hundreds of sectors and clusters on various tracks. [Hence the term, random access device, meaning a drive that can retrieve or store data in any order to any location on the disk. Sequential access devices, such as backup tapes, store data in sequential order, and are unable to retrieve data as quickly.
Allocated clusters contain data that is “active” according to the file allocation table. Unallocated clusters may contain data, but in storage space that the computer is no longer using for active files (see Deleted Files below). Thus, although unallocated clusters frequently contain “residual” data, this space will be randomly used (overwritten) to store new active data.
WHY DELETED IS NOT ALWAYS DELETED
When a user deletes a file, the operating system only deletes the first letter of the file name from the file allocation table, and reports the sectors containing the “deleted” data as “empty,” or available for the storage of new data.
For example, files called:
Assignment1.doc Exercise1.xls MyPage.htm (located in this graphic with the corresponding color)
would look like:
_ssignment1.doc _xercise1.xls _yPage.htm
to the operating system. However the data remains unchanged and “intact” until new data is written to the specific sector and cluster containing the “residual” data. During the process of ‘overwriting’ new data onto the sectors containing the old data that is when the residual data is truly deleted.
However, since data is randomly stored into the millions of potentially available sectors, it’s unusual for all sectors containing a file to be overwritten with new data. This provides an opportunity for portions of deleted files to be recovered from “unallocated” clusters long after the user has deleted the file from the computer.
THE PROCESS OF RECOVERING ELECTRONIC EVIDENCE
There are two primary steps for recovering electronic data; “acquisition” of the target media, and a forensic byte-by-byte analysis of the data.
Utilizing special computer forensic tools the target media is acquired through a non-invasive complete sector-by-sector bit-stream image procedure. During the imaging process, it is critical the mirror image be acquired in a DOS environment. Turning on the computer and booting into its operating system (usually Windows) will subtly modify the file system, potentially destroying some recoverable evidence.
The resulting image becomes the “evidence file,” which is mounted as a read-only or “virtual” file, on which the forensic examiner will perform their analysis. The forensics software used by CFI creates an evidence file that will be continually verified by a Cyclical Redundancy Checksum (“CRC”) algorithm for every 64 sectors (block) of data and a by a MD5 128 bit encryption hash file for the entire image. Both steps verify the integrity of the evidence file, and confirms the image has remained unaltered and forensically intact. Using the MD5 hash encryption, changing even one bit of data will result in a notification that the evidence file data has been changed and is no longer forensically intact.
SEARCHING DIGITAL EVIDENCE
Specialized forensic software provides several methodologies for searching the evidence file. Multiple pieces of media evidence; for example 2 hard drives, a floppy and a multiple session CD-ROM, can be searched, sorted, and analyzed simultaneously.
A Windows Explorer view displays the files and folders of the target media in an easy to browse format. Each file is displayed in a spreadsheet format where the files can be sorted and filtered under numerous fields. The examiner can designate which files to include in this view, such as files from a single folder or a single volume. A preview pane using a hex/text viewer displays the contents of a highlighted file, with the file slack ï¿½ portions of unallocated clusters – shown in red. All search hits are highlighted automatically.
Keyword search utilities are used to find words relevant to target documents or messages. These searches will locate any “bytes” of data matching the search term. So the development of effective search terms is critical to recovering digital evidence and is a major factor in the success of any forensic examination. For example, searching for the word “info” may locate tens of thousands of hits where the letters “info” where used in a file or line of code. Redefining the search for “email@example.com” would help narrow the number of responses. Reviewing the hits from every keyword search consumes a major portion of the examiner’s time while combing through digital evidence. Narrowing the search to terms or phrases unique to the case situation will enhance the results and reduce the cost.
Forensic software also locates drafts of documents, back-up files (.bak, .wbk), temporary files (.tmp), cache files, autosaves, registry data and residual data. Wild card searches can be conducted for “general formats” such as all telephone numbers of a specific area code, network IDs or email domains, even when the specific ID is not known.
Time and date stamps, access logs and recycle bin activity are often a critical focus of examination and can be recovered. Files (but not residual data) can be sorted by creation date, last accessed, or last saved. Swap files and file slack, which are locations on the disk were deleted residual data often resides, can be recovered. Print spooler files, with their original time stamps, can be recovered and reviewed. Files that were recently accessed can be determined and a list of all Internet sites (URL’s) accessed, and the time and date of access, can be compiled. Also, a forensic picture gallery automatically identifies all graphic files and displays them as thumbnails that can easily be copied onto a CD ROM.
Forensic examiners will also be able to identify any attempts to hide a file by merely changing its name. Each file’s extension (i.e. .jpg, .gif, .doc) is matched against the file’s actual “signature” to determine if an attempt has been made to “hide” the file. If a file was created in Word (.doc) and the extension was changed to .jpg, the forensic examiner is able to identify and flag that file.