Professional Documents
Culture Documents
optical
mechanism.
Optical
character
recognition (optical
character
INPUT IMAGE
PRE PROCESSING
SEGMENTATION
COMPARISON
Office
Document
Imaging (MODI)
enables
editing
Document and Image (page) objects to display and read a document as easily as a
paper document, perform optical character recognition (OCR), search for text
within scanned documents, copy and export text and images, combine multiple
pages into a single compressed file, and reorganize document pages as easily as
rearranging papers in a folder.
The MODI object model consists of the following objects, their members, and
dependent objects:
The Document object represents an ordered collection of pages (images).
The Image object represents a single page of a document.
The Layout object exposes the results of optical character recognition
(OCR) on a page.
The MiDocSearch object exposes document search functionality.
The viewer control (the MiDocView object) is an ActiveX control that
displays the pages of a document.
The MODI Document object represents an ordered collection of document images
saved as a single file. You can use the Create method to load an existing MDI or
file, or to create an empty document that you can populate with images from other
documents. The OCR method performs OCR on all pages in the document, and
the On OCR Progress event reports the status of the operation and allows the user
to cancel it. The Dirty property lets you know whether your document has unsaved
OCR results or changes. The Save As method allows you to specify an image file
format and a compression level. You can also use the Print Out method to print
the document to a printer or a file.
The MODI Layout object provides summary information (such as the number of
words) about the recognized text on the page and gives access to the recognized
text itself and to each individual word in the text. The Word object exposes
additional information about each word's font, its location on the page, and even
the OCR engine's Recognition Confidence factor, which estimates the likelihood
of a recognition error.
The MODI object represents the MODI viewer control, an ActiveX control that
you can use to display and scroll through a MODI document. You can manipulate
the scaling of the document in the window, scroll the image programmatically,
retrieve the user's selection as text or as an image, and return information about the
contents of the viewer window and its coordinates.
The MODI object model makes it possible to automate many types of document
management tasks. Here are just a few examples:
Automating the rollup of multiple single-page scanned image files into a
single compressed multiple-page document file
Automating OCR operations on entire folders of documents
Automating the searching of scanned documents such as resumes for certain
words.
MODULE 1:
EXTRACT FROM TEXT TO IMAGE USING OCR:
This technical tip shows how to extract text from part of an image inside .NET
Applications. Aspose. OCR for .NET provides OCR Engine class to extract text
from a specific part of the image document. The OCR Engine class requires Source
image, Language and Resource file for character recognition. The source image is
the document on which OCR will be performed. The image can be a BMP, TIFF,
JPEG, GIF or PNG file. The OCR
source image. One or more languages must be specified before performing OCR.
This is because the OCR Engine tries to recognize characters of the specified
languages in the image. The OCR Engine recognizes text word by word. Each
recognized word has a specific language which might be different from the
language of the other words. Aspose. OCR for .NET also maintains the priority of
each language. The language added first has the highest priority. Each language
added afterwards has lower priority: the last added language has the least priority.
The language priority matters when OCR is performed. Aspose .OCR for .NET
first attempts to read characters as the highest priority language. If it doesn't
recognize them, it tries to read them in the next language. If a word is identical in
two or more languages, the OCR Engine assigns the highest priority language to
the recognized word. The resource file is a ZIP archive that contains the data
necessary to perform OCR. The Ocr Engine. Resource property must be set and
point to the resource file before starting an OCR process.
To run OCR on an image using the OCR Engine class:
Create an instance of OCR Engine and initialize it using the default
constructor.
Set the image file using the OCR Engine. Image property.
Add language(s) using the OCR Engine. Languages. Add Language method.
Set the start point, width and height of the recognition block using the Ocr
Configuration. Add Recognition Block method.
Set the resource file using the OCR Engine. Resource property.
Call the OCR Engine. Process method to perform OCR on the whole image.
If OCR Engine. Process returns true, then get the recognized text with the
Recognition Block. Text property.
MODULE 2
CONVERT
FROM
TEXT
TO
VOICE
USING
SPEECH
RECONGNITION:
A speech-to-text (or voice recognition) application makes the translation of spoken
words into text possible. This functionality can be used in many other fields of life,
Furthermore, speech recognition can be used in many other fields of life, as well as
in the healthcare, in-car systems, military, telephony, education or computer
gaming. Lets see some examples:
HEALTHCARE:
IN-CAR SYSTEMS:
Some of the most recent car models allow makes voice control possible. For
example simple voice commands can be used to initiate phone calls, select radio
stations or play music from a compatible smartphone, MP3 player or music-loaded
flash drive.
MILITARY:
Speech recognition can be used for example in fighter aircraft with applications
such as setting radio frequencies, commanding an autopilot system, setting steerpoint coordinates and weapons release parameters, and controlling flight display.
Furthermore, the acoustic noise problem can be eliminated in the helicopter
environment by using a speech-to-text application. And last but not least, training
for air traffic controllers (ATC) represents an excellent application for speech
recognition systems, too.
TELEPHONY:
In the field of telecommunication speech recognition is used mostly as a part of a
user interface for creating predefined or custom speech commands. It can be also
useful in call centers where it is needed to type a lot of data in a relatively short
period of time. Call center agents can use voice recognition or speaker
identification to find the identity of the other party. This functionality can be easily
combined with call assistant features such as voice dialing, call routing, simple
data entry, etc.
EDUCATION:
Speech-to-text conversation can facilitate learning for people with disabilities (for
example for blind people) and it can help in fields where listening comprehension
is particularly important. For instance, for language learning, speech recognition
can be used to teach proper pronunciation, in addition to helping a person develop
fluency with their speaking skills.
COMPUTER GAMING:
Automated speech recognition is becoming more widespread in the field of
computer gaming and simulation as well.