You are on page 1of 7

Archive Definition

An archive is a single file that contains any number of individual files plus
information to allow them to be restored to their original form by one or more
extraction programs.

Archives are convenient for storing files. For example. they make it easy to locate all
files in a particular category or related to a particular project at a later date. They also
allow the directory tree structure (i.e., hierarchy of directories) in which the files were
originally contained to be maintained.

Archives are also convenient for transmitting data and distributing programs. In fact,
most software that is distributed over the Internet is distributed as an archive that
contains all related files as well as documentation.

Moreover, archives are very easy to work with, often much easier than dealing with
large numbers of individual files. For example, it is a simple matter to compress an
archive in order to reduce its size and transmission time and then decompress it after
receipt. Also, it is possible to read, extract or insert individual files into an archive
without first restoring all of the contents of the archive to their original form.
Moreover, large archives can easily be split into segments for such purposes as
sending as e-mail attachments or storing on floppy disks, even if they have been
compressed, and they can later be reassembled without compromising the integrity of
the data.

The tar (i.e., tape archive) command, originally designed for tape backups, is used
extensively for archiving and unarchiving files and directories on Unix-like operating
systems. Several other common commands can also be used to create archives,
including bzip2, zip, cpio and even the simple cp command. Some of these programs,
such as bzip2 and zip, combine archiving with compression. tar does not, but is easy
to compress archives created with tar using compression utilities.

WinZip is the most popular archiving program for use on Microsoft Windows
operating systems. It also has compression and encryption capabilities.
Introduction to File compression and Archiving
Table of Contents

1. File compressors and archives?


2. When to use an archival program or file compressor
3. Types of archives and file compressors
4. Conclusion

File compressors and archives?

This tutorial will focus on explaining what file compression and archives are and how to
use them. This technology will not only enable you to more efficiently send attachments
via email, but also save space on your hard drive and allow you to more easily back up
files. To begin, lets discuss some of the terminology that will be used in this tutorial.

File Compression
File compression is the act of taking a file on your hard drive and making it so that
it's size becomes smaller and therefore takes up less storage space and is faster to
upload or download on a network or the Internet.

File Compressor
A compressor is a program that actually compresses another file. Compress, Gzip,
WinRar, and Winzip, among many others, are examples of these types of
programs.

File Archival Program


An archival program takes many seperate files and archives them into one file. For
example, an archival program would allow you to take a directory of files and
archive them into one file that you can then send as an email with a single
attachment for all those individual files.

Archive
An archive is a single file that contains many seperate files. These individual files
can be extracted from the main archive

Compressed File
A file that has been compressed into a smaller size than it originally had.

It is important to note that many programs can both archive and compress files. For
example, Winzip will take many seperate files, compress them and then store them into an
archive file. Thus you are left with a single archive which contains many compressed file.

When to use an archive or file compression

Now that you know what a compressed file or an archive is, you must be asking yourself
why you would want to use them. The three most common reasons to use archives and
compressed files are:

File compression saves storage space

By using a compressor to make an image smaller, you are using up less space on your
hard drive to store this file. For example, a word document that is 89 Kilobytes on my hard
drive, when zipped, is now only 8 Kilobytes. That is a 90% saving in storage space! Take a
look at the table below to see some more examples of the type of storage space you can
save using file compression:

Size before Size after


Type of Percentage
compression compression
file Compressesd
in Bytes in Bytes
Word
89,600 8,959 90%
Document
TXT File 29,978 10,476 66%
Excel
109,056 89,816 18%
Document
EXE File 66,048 34,757 48%
DLL File 260,096 121,155 53%
JPEG
1,093,504 1,092,040 0%
Image
Bitmap
4,854 4,854 0%
Image
GIF Image 583,413 583,413 1%
MP3 File 5,234,688 5,134,985 2%

As you can see, some file formats compress a great deal more than other formats. This is
because certain file types are already compressed, and therefore can not be compressed
any further. Looking at the chart above, it obviously does not make sense to compress
MP3, GIF, JPEG, or other compressed file formats as you will not gain any benefit. On the
other hand, Word, Excel, Text, and program files compress quite well

Transmission Speeds

How fast a file is transmitted over a network or the Internet is dependent upon how big
this file is. For example, a file with the size of 1,337,344 bytes took approximately 28
seconds to upload to a remote server. Yet this same file compressed to a size of 554,809
bytes only took 12 seconds. That is a savings in time of over 50%. Now imagine you were
sending files that would normally take an hour to send, and after compressing the files, it
now only takes 30 minutes. The savings in time and potentially money is incredible.

Sending only 1 file

There are times that you need to send many attachments in one email message. This can
be difficult and confusing at times, so instead you use an archival program to convert the
20 files into a single file. This is much more organized and easier to manipulate.

Backing up data

Archival programs are used often to back up data. You would use archives to backup a
folder or a number of files into a single file and compress them as well. This allows you to
save space and then store that individual file on a floppy or other removable media.

Note: It is important to note though, that with all formats, whether it be a compressed file
or an archive, you must always uncompress and/or extract the file before you will be able
to use it.
Types of archives and file compressors

There are many types of archival progarms and compressors. The table below will give a
listing of the more common programs that are used today along with the file extension
that they use:

Program* File Extension Type Operating System**


WinZip .zip Archive/Compress DOS/Windows
WinRar .rar Archive/Compress DOS/Windows
Arj .arj Archive/Compress DOS/Windows
Gzip .gz Compress Unix/Linux
Compress .Z Compress Unix/Linux
TAR .tar Archive Unix/Linux
Stuffit Expander .hqx Archive/Compress Apple

* Many of these programs can handle more than one format. The format listed is the
native format for that program.

** The operating system listed is the native operating system for these formats. These
formats may be able to be used on other operating systems as well.

Conclusion

Now that you understand file compression and archiving, download some of the above
programs and play around with them. They are very easy to use and understand. For
detailed instructions on creating Zip files, visit the following tutorials:

Program Definition

A program is a sequence of instructions understandable by a computer's central


processing unit (CPU) that indicates which operations the computer should perform
on a set of data.

Ready-to-run programs are stored as executable files that reside in storage, but they
are copied into memory when they are launched so that their machine code will be
immediately available to the CPU, which is the main logic unit of a computer.
Storage refers to devices or media that can retain data for relatively long periods of
time (e.g., years or even decades), such as hard disk drives (HDDs), floppy disks,
optical disks (e.g., CDROMs and DVDs) and magnetic tape. This contrasts with
memory, whose contents are retained only temporarily (i.e., while in use or only as
long as the power supply remains on) but which can be accessed (i.e., read and
written to) at extremely high speeds.

An executable file is a file that has been converted from source code into machine
code, which is directly understandable by the CPU, by a specialized program called a
compiler. Source code is the version of software as it is originally written (i.e., typed
into a computer) by a human in plain text (i.e., human readable alphanumeric
characters). Source code can be written in any of the hundreds of programming
languages that have been developed, some of the most popular of these are C, C++,
Cobol, Fortran, Java, Perl, PHP, Python and Tcl/Tk.

Software is usually used as a generic term for programs. However, in its broadest
sense it can refer to all information (i.e., both programs and data) in electronic form
and can provide a distinction from hardware, which refers to computers or other
electronic systems on which software can exist and be used.

Programs are often divided into two broad categories: systems programs and
application programs. The former refers to operating systems and utility programs
that manage computer resources at a low level, that is, that enable a computer to
function. Examples of such utilities include compilers, device drivers and the X
Window System (which provides basic graphic capabilities on Unix-like operating
systems). Examples of major categories of application programs are word processors,
graphic image processing programs, database management systems and games. Many
computer users are only familiar with application programs may not even realize that
systems programs exist.

A program remains just a passive file (or collection of files) until it is launched, at
which time it spawns (i.e., gives birth to) one or more processes. A process can be
thought of as a running instance of a program. On multitasking operating systems
(i.e., systems that allow multiple processes to run seemingly simultaneously),
multiple instances of a program can run concurrently, each of which is a different
process (or set of processes).

Programs are commonly launched directly by a user by clicking on an icon (i.e., a


small image representing the program) in a GUI (graphical user interface) or by
typing in a command at the command line (i.e., all-text user interface) and then
pressing the ENTER key. Some programs are launched automatically by the system
as it boots up (i.e., starts up) or in response to certain events.

A command is an instruction telling a computer to do something, such as launch a


program. It consists of the name of one or more programs and any options and
arguments for each command. An option is a (usually) single letter code that modifies
the behavior of a command line program (i.e., a program that operates in the all-text
mode) in some specific way. An argument is a file name or other input data for such a
program. The ability of command line programs to use options and arguments adds
greatly to their flexibility.
Executable programs are usually stored in one of several standard directories on
Unix-like operating systems, including /bin, /sbin, /usr/bin, /usr/sbin and
/usr/local/bin. Although it is not necessary for them to be in these locations in order
to be operable, it is often more convenient.

Programs can range all the way in size and complexity from just a few lines of code
that merely displays a simple phrase such as Hello World on the monitor screen to a
massive operating system that contains hundreds of megabytes of code.

Many of the most widely used computer programs are proprietary (i.e., commercial)
software, which is usually not available free of charge and for which there are
generally severe restrictions on its use. However, the past several years have seen a
rapid growth in the development and use of free software, which refers to programs
that can be obtained at no monetary cost and that can be used for any desired purpose,
including modifying, copying, using on as many computers as desired, giving away
and even selling.

In contrast to many types of products, the quality and value of programs should
definitely not be judged on the basis of their price. This is because many free
programs have features and performance as good as or superior to their non-free
counterparts, plus they also have the additional advantages of not having any onerous
EULAs (end user licensing agreements) and allowing users to modify them in any
way desired. In fact, some of the most popular programs are free software (e.g., the
Apache Web Server, which hosts more than 70 percent of all web sites on the Internet
web sites).

Likewise, programs should not be judged on the basis of their size. An important
tenet of the Unix philosophy is that programs should be purposely made as small as
possible and that they should be designed do only one thing but do it well. This is
because large programs that attempt to do numerous things can be too complex for
even the brightest of human minds to comprehend in their entirety, and thus it is
extremely difficult to remove all bugs (i.e., errors) and make them as efficient and
secure as possible.

Moreover, small, specialized programs have the advantage that they are easy to
design to work well with each other when they are connected via pipes to form
pipelines of commands. This makes it possible (and easy) to perform highly specific
tasks that would be very tedious or impractical by any other means.

Programming is the creation of programs. It is performed by programmers, also


frequently referred to as developers, who write the source code for a program using
one or more programming languages and a text editor and often with a variety of
more sophisticated tools as well, such as integrated development environments
(IDEs), which operate in GUIs. Creating simple programs can be easy, interesting and
educational, even for people with relatively little computer experience1.
Directory Tree Definition
A directory tree is a hierarchy of directories that consists of a single directory, called
the parent directory or top level directory, and all levels of its subdirectories (i.e.,
directories within it).

A directory in a Unix-like operating system is a special type of file that contains a list
of names and corresponding inodes for each filesystem object (i.e., directory, file or
link) that appears to the user to be in it. An inode is a data structure that stores all the
information about a file except its name and its actual data. A data structure is a way
of storing data so that it can be used efficiently.

Any directory can be regarded as being the start of its own directory tree, at least if it
contains subdirectories. Thus, a typical computer contains a large number of directory
trees.

The term directory tree takes its name from the fact that a diagram of it resembles an
inverted tree, or a branch thereof, usually with a series of directories branching off
from a single directory, more directories branching off from some or all of them, etc.

Virtually all modern computer operating systems use directory trees for organizing
files. Unix-like operating systems feature a single root directory from which all other
directory trees emanate. Microsoft operating systems can have multiple independent
root directories, with names such as C:, D: and E:.

The du (i.e., disk usage) command is a convenient tool for obtaining information
about directory trees, including total disk space consumption of the tree (inclusive of
all of the files in it) and the names and sizes of each branch and file.

You might also like