Professional Documents
Culture Documents
Operations on Files
11.0 Introduction
In a computer, variety of data and program codes remain stored in hard disks
in the form of files. You have seen a paper file where several pages containing records of
similar category of information remain attached together within the boundary of a cover
having a filename marked on it.
In a school, classwise Student files are maintained where information about each
student is recorded and all those records are placed together within a file-cover with a name
on it. For different classes, different student files are normally maintained. Again, all such
files can be stored together in the rack of a file-cabinet (like a folder in a computer). To
retrieve any information about a particular student of a specified class, the concerned class
file (identified by the filename) can be taken out and the records within that file can be
searched out to get the details of the student. If the file records remain sorted with respect to
roll-numbers, then the roll-number of the particular student can act as a search key.
A computer can also work in the same fashion. Instead of paper files, computer
will store electronic files in a floppy disk or a hard disk. Therefore, a disk file is nothing but
a collection of records. Each record can contain details about a student. The records can be
typed through a keyboard and saved as a file in a hard disk.
A computer program may use Two buffers – one for Input operation and another for
Output operation. Input can be thought of as a character type data stream coming from a
keyboard and placing them in a temporary storage location called buffer. Similarly, to read
out some data from a stored disk file, that particular file is to be opened first, then the
desired data portion is read by calling the read () method, which puts retrieved data in a
buffer wherefrom further processing or data transfer can take place. The transferred or
processed data is then placed to an output buffer connected to some output device like
VDU-screen or printer.
Java takes help of java.io package to handle both standard I/O and file streams.
There are two types of io-stream classes – Byte stream and Character stream. Byte
oriented I/O operations are needed to handle binary files that use 8-bit codes. Character
oriented I/O operations are used to handle text files for which java uses 16-bit Unicode.
To understand Java’s I/O system, the Stream concept must be made clear to you.
The Java’s I/O system makes use of I/O Class methods that remain encapsulated in the
java.io package. Therefore, for any I/O operation with files/devices, you have to begin your
program with the statement
import java.io.*
Byte streams are defined with two abstract classes – InputStream and OutputStream.
Many useful concrete subclasses are defined by java system from these two abstract classes.
BufferedInputStream (for buffered byte input),
The above list shows only a few examples of concrete subclasses of the byte stream
classes.
As mentioned before, Character streams also are defined by two abstract classes --
Reader classes and Writer classes. Examples of some of its most useful Concrete
subclasses are given below –
The PC that you are using has a Keyboard, which is taken as a standard input device
for feeding data and instructions. The input part of the console is the keyboard and the
output part of the console is the VDU-screen (monitor). Using a byte stream to read
console input is quite possible but normally not preferred. The current versions of java
prefer to handle console IO using character-oriented streams.
Java does not have a generalized console input method (like scanf () of C language)
but takes input by reading from the System.in stream. To capture a character-oriented
stream from console input, System.in is to be wrapped in a BufferedReader object. Since
System.in can refer to an object of the InputStream type, which can convert bytes into
characters, so these two are to be combined. To logically connect a BufferedReader with
the console keyboard, we have to make use of two java statements as shown below –
The above two statements can be written as a single statement as shown below –
With the press of a key on a keyboard, a key-code gets generated by the its
hardware. To convert that key-code into understandable character code,
InputStreamReader () is got to be called. The character, thus keyed-in, is placed into the
conin (console-in) BufferedReader for the purpose of reading.
The version of read () that will be used here has the form –
Run this program and see what output do you get by typing characters on the keyboard
[represented as conin object of the BufferedReader class].
import java.io.*;
public class BRLin {
public static void main() throws IOException
{
BufferedReader conin = new BufferedReader (new
InputStreamReader (System.in));
String str1 = " start";
System.out.println ("Enter several text lines & end with return");
System.out.println (" Type 'stop' to end further entry.");
do {
str1 = conin.readLine();
} while (!str1.equals("stop"));
System.out.println (str1);
}
}
Run this program to see what happens when you type stop.
Console output using print () and println () methods you have already seen.
Those methods are defined in the PrintStream class, which takes care of the System.out
objects. Although System.out deals with byte stream, its use for simple programs is still
acceptable. PrintStream also allows character-based output using the low-level write ()
method. The program below uses write () method to output the character ‘G’ followed by a
new line (\n). The write () method takes help of the byte-value (lower 8-bits only of the 16 –
bit Unicode) of the character G.
It will be much easier to write the above program using print () or println () methods.
Please try that as an exercise.
Although use of System.out for console output operations is permissible, for real-life
programs use of PrintWriter stream is highly recommended. PrintWriter supports both
the print () and println () methods. The most popular PrintWriter constructor is –
PrintWriter (OutputStream <OSname>, boolean flushOnNewline)
For example, a student record can have a structure (field_name + data types) like –
Computer files may be of different varieties --a plain text file, a binary file (like an
executable file) or a data file (collection of structured data-types called records). In java, all
such files are byte-oriented but it allows a programmer to wrap a byte-oriented file stream
within a character-oriented object.
Where fileName is the name of the file opened for IO operations. If input file is not
found FileNotFoundException is thrown. Similarly when output file cannot be created
FileNotFoundException is also thrown.
After Opening and performing IO operations on an external file (i.e. file stored
in a secondary storage like floppy, disk, etc), that particular file is got to be explicitly closed
by calling the close () method.
// Create first a text file, which is to be read and displayed by this program
import java.io.*;
public class TypeFile
{public static void main(String args[]) //file name passed as argument
throws IOException
{int i;
FileInputStream fin;
try {
fin = new FileInputStream(args[0]);
} catch (FileNotFoundException e )
{System.out.println (" File not found:");
return;
}catch (ArrayIndexOutOfBoundsException e)
{System.out.println (" Type File filename to show");
return;
}
//read the file characters until EOF (-1) is reached
do {
//To read from a file, the read () method of FileInputStream is used.
i = fin.read();
if (i != -1) System.out.print((char) i);
} while (i != -1);
fin.close(); // file closed
}
}
To run this program, you have to create a file first with a filename in any drive under any
directory. In this program the file name has been passed as an argument – args[0] – whose
value as a String you have to pass while running it [Picture 11.1(a)]. However, you could
directly specify the file-name with specific drive and directory in the following fashion –
when the file mytestfile.txt remains stored in the root directory of drive d:
Another point you have note is that instead of one \ -- use of two \\ will be necessary to
inform the java compiler that \ is not to be taken as an escape sequence (like \t, \n, etc) but
should be considered as \ for indicating directory or sub-directory.
In example-11.5 you have observed the addition of a few words ‘ throws IOException’
along with the main () statement and thereafter use of try {....} and catch {....} blocks.
These try/catch blocks are used to catch error(s), if any, occurred during IO operations and
handle them as per user supplied codes. Try block contains normal processing codes.
Catch block contains codes related to exception handling.
There are many exception classes (defined in java.io package) like IOException,
NumberFormatException, ArithmeticException, RunTimeException, etc.
The keyword throws is used to inform automatically the error handler about the
occurrence of some error. Java treats exceptions as objects, which are passed to exception
handler written by the programmer to enable recovery or to report nature of errors. The
catch {...} block is responsible for handling exceptions.
// input – 4 records typed on the console keyboard; output goes to a disk file
import java.io.*;
public class KBtoFileIO
{
public static void main() throws IOException
{
int k, size = 4;
try {
BufferedWriter bufwr = new BufferedWriter (new FileWriter
(fileName));
k =0;
do{
String record = KBin.readLine ();
fileOut.println (record);
k = k+1;} while (k != size);
Run this program to see how records entered through a keyboard get stored as a file named
kbinput.txt (a notepad file) in the C: drive at root [Picture 11.2 (a) & (b)].
Example 11.6 can be regarded as an entry-recording program (by increasing the value of
size as per requirements) that can store keyboard transactions in the form of a disk file. One
stored file can be copied to a new location with a new name, if required. Example-11.7 is an
example of such a file copy program.
For copying we have to make use of write () method instead of print () method,
which can be used for VDU screen display only. Let us now examine the java code of such a
copy program. In the program, the source and destination file-names have been specified
directly within the codes. Of course those names could have been collected in an interactive
mode using appropriate keyboard entry operations. As an exercise, you can try that.
Example-11.7 File Copy with a Different Name at a new Location
try {
do {
i = fin.read();
if (i != -1) fout.write(i);
} while (i != -1);
} catch (IOException e) {
System.out.println ("copy error");
}
System.out.println ("Copy operation successful.");
fin.close();
fout.close();
}
}
Try also to copy a different file from one drive to another drive having the same
name. Of course, don’t forget to make changes in the program wherever necessary.
In this program three try blocks are used – one for opening the input file for
reading, another for opening the output file for writing and the third one to perform the
copy operations. The first try has one catch block, the second try has two catch blocks and
the third try has only one catch block. Try to understand why it is so.
Remember, if multiple exceptions are supposed to arise from a try block, then for each
exception one-catch block will be necessary. That is why, for the second try block – two
catch blocks – one for FileNotFoundException and another for
ArrayIndexOutOfBoundException have been used in this program.
Also remember that all opened files must be closed before ending a program
involved in file IO operations.
The File class deals directly with files and the file system. It may not be out of place to say
a few words about the file system, which is a component that helps managing all the files
stored in a computer system. For interactions with a file, an operating system must know
some details about that file. The details are – name of the file with the path to reach there,
date and time of creation, starting point of its physical storage location, length of the file in
KB or MB, access permissions, date of last access / modification, etc. Such details help
quick and correct retrieval of files. File objects containing file specific such details can be
created from the class File.
New File objects can be initialized with any one of the following constructors: --
File class has many defined methods like getName (), getParent (), exists (), list (), etc.
In java, a directory is also treated as a file which contains a list of file names and other
directories that can be examined by the list () method of the class File.
Now study example-11.8 to see the use of the class file.
Try to run the program using a different file already stored in your computer.
This example gives you an idea how the class File methods can be used to know the details
about a stored file.
Tokens
In a sentence of any language, several words (like noun, verb, adjective, etc.) are
placed side-by–side with blank spaces in between them. Again to help reading and for
easy understanding, comma or semi-colons are used within a sentence, which always ends
up with a full stop. These blank or white spaces, punctuation symbols, etc help identifying
individual words distinctly and properly.
In a programming language, words can be regarded as “tokens” and they remain
separated by delimiters like white space, language specific reserve words, arithmetical
symbols like +, -, *, %, ^, etc. etc. Therefore, an expression of a Java or any
programming language (equivalent to a sentence of a natural language) can be scanned
and analyzed to separate out the tokens used therein. This process of dividing an
expression string into a set of tokens is known as Parsing.
The String Tokenizer class provides necessary support to the parsing process. That
means, given a programming statement, you can separate out all individual tokens
contained in it using the String Tokenizer class methods.
Delimiters are special characters that separate tokens. The default set of delimiters
consists of white space characters like – space, tab, newline, carriage return, etc. The
constructors used to initialize String Tokenizer objects are as follows: --
import java.util.StringTokenizer;
public class StringTokenDemo
{
static String exp = " BitSet bits = new BitSet(16);";
public static void main() {
StringTokenizer st = new StringTokenizer (exp, " = ; ");
while (st.hasMoreTokens()) {
String val = st.nextToken();
System.out.println (val);
}
}
}
By running this program, you will see the output as shown below: --
BitSet
bits
new
Bitset(16)
It is often required not only to identify the tokens but also to classify them according
to their types while scanning an input stream. Stream Tokenizer can be used not only
for identifying token types but also for detecting the end of file (EOF), end of lines
(EOL) and counting the total number of tokens. The definition of StreamTokenizer
class takes care of the following instance variables and methods ---
import java.io.*;
public class StreamTokenDemo
{
public static void main() throws IOException
{
FileReader infile = new FileReader ("c:\\copy.txt");
StreamTokenizer inpstream = new StreamTokenizer (infile);
int toktype = 0;
int noOftok = 0;
do {
toktype = inpstream.nextToken();
outTType(toktype, inpstream);
noOftok++;
} while (toktype != StreamTokenizer.TT_EOF);
System.out.println (" Number of tokens = " + noOftok);
}
}
}
}
If you run this program, the following output will appear on the terminal window --
Please remember one important point that StreamTokenizer identifies words delimited
only by whitespace or blank space used on both the sides of it. For this reason, you find
“A.M.Ghosh” and “program.” are identified as TT_WORDs in this program.
Program translators during compilation use these two types of Tokenizers. No further
discussion on this advanced topic will be made in this introductory book.
11.7 Conclusions
This chapter has made discussions on the input-output operations in general and on file
operations in particular. Java being an object oriented programming language, all IO
operations are performed by using different methods of java’s IO classes. These methods
handle input and output streams, which may be either of byte or character type. Modern
versions of java prefer character streams, although uses of converter classes that can convert
byte streams to character streams are very often made.
The IO streams are properly buffered to take care of speed mismatches between IO
devices and the processor, which controls the entire program execution. However, a
programmer may keep herself/himself aloof from knowing the inner details. She or he
should have knowledge only about the methods supported by the IO classes that are to be
used, when and how. A good number of examples have been given here for study and use
by the beginners.
Discussions about the class File has also been made here. Finally, the concept of
tokens and how to identify and extract them from a source code file or a text file using
StringTokenizer and StreamTokenizer classes has also been explained and demonstrated.