You are on page 1of 98

TRNG I HC KHOA HC T NHIN

KHOA CNG NGH THNG TIN


B MN CNG NGH TRI THC

TRN VN TRI - 0812543


NGUYN MINH TRI - 0812548

TRA T IN ANH VIT QUA CAMERA TRN


IN THOI DI NG DUNG ANDROID

KHA LUN TT NGHIP C NHN CNTT

GIO VIN HNG DN


ThS. BUI TN LC
PGS. TS. INH IN

KHA 2008- 2012

NHN XT CA GIO VIN HNG DN

TpHCM, ngy .. thng nm


Gio vin hng dn

ii

NHN XT CA GIO VIN PHN BIN

Kha lun p ng yu cu ca Kha lun c nhn CNTT.


TpHCM, ngy .. thng nm
Gio vin phn bin

iii

LI CM N
Chung em xin gi li cm n su sc n thy inh in v thy Bui Tn
L c l nhng ngi a trc tip hng dn chung em, tao nhiu iu ki n thu n
li, gp y kin v m t chuyn mn trong lu n vn v nh m chung em mi c
th hon thnh c lu n vn trong thi gian cho phep.
Chung con cung xin gi li cm n n cha me v gia inh l nhng ngi
thn nht a nui dng, ng vin, tao iu ki n thu n li cho chung con.
Chung em xin cm n cc anh chi trong cng ty Kim T in a giup , tao
iu ki n giup chung em hon thnh lu n vn ny.
ng thi, chung em cung xin cm n chn thnh n quy thy c trong
Khoa v cc ban be gn xa a lun quan tm v theo st chung em tao cho chung
em ngun ng lc hon thnh lu n vn.
Trong qu trinh thc hi n lu n vn c gi sai st, kinh mong nh n c s
chi bo ca quy thy c.
Tp H Chi Minh, ngay thang nm 2012
Nhom sinh vin thc hi n

Trn Vn Tri Nguyn Minh Tri

iv

Khoa Cng Ngh Thng Tin


B mn Cng Ngh Tri Thc

CNG CHI TIT


Tn Tai: Tra t in Anh Vit qua camera trn in thoai dung Android
Giao vin hng dn: PGS. TS. inh in ThS. Bui Tn Lc
Thi gian thc hin: (t ngy nhn ti n ngy 25/6/2012 )
Sinh vin thc hin:
Trn Vn Tri - 0812543
Nguyn Minh Tri 0812548
Loi tai: Xy dng ng dng
Ni Dung Tai: Xy dng ng dng tra t in Anh-Vit trc tip trn in thoai
di ng dung h iu hnh Android qua camera. Tim hiu b th vin nhn dang ky
t quang hc Tesseract, cch thc chuyn ma Tesseract chay trn nn tng
Android. Tim hiu mi trng lp trinh trn Android, cc k thut x ly ng nh
thu nhn nh thng qua camera ca in thoai, s dng cng c NDK chay ma
ngun C/C++. Tim hiu v ci t cc thut ton tra t in, cu truc lai tp tin d
liu t in, thut ton khi phc t gc Stemming v tim t gn ung. Chng
trinh sau khi hon thin s bao gm chc nng tra t in trc tip qua camera hoc
tra t qua vic nhp liu t bn phim.
K Hoch Thc Hin:
1/2/2012 29/2/2012:
Trn Vn Tri: Tim hiu mi trng lp trinh Android, tim hiu cc
thut ton xy dng cu truc d liu t in.
Nguyn Minh Tri: Tim hiu mi trng lp trinh Android, cc th
vin nhn dang ky t quang hc OCR.

1/3/2012 31/3/2012:
Trn Vn Tri: Ci t cu truc d liu t in v tra t trn Android.
Nguyn Minh Tri: Tim hiu th vin Tesseract OCR, chuyn ma
Tesseract v chay th nghim trn Android.
1/4/2012 29/4/2012:
Trn Vn Tri: Tim hiu thut ton khi phc t gc v ci t trn
Android.
Nguyn Minh Tri: Thit k kin truc cho ng dng, tich hp cc tinh
nng a ci t bao gm tra t in, b nhn din Tesseract OCR, khi phc
t gc.
2/5/2012 31/5/2012:
Trn Vn Tri: Thit k giao din, cc control, sa li trong chng
trinh.
Nguyn Minh Tri: Lp trinh thu nhn nh v ci tin cht lng nh
chp trn in thoai.
1/6/2012 24/6/2012
Trn Vn Tri: Ma ha d liu t in, sa li chng trinh. Vit bo
co
Nguyn Minh Tri: Tich hp thm tinh nng tim t gn ung, sa li
chng trinh. Vit bo co.

vi

Xac nh n cua GVHD

Ngay.thangnm

Hng dn chinh

SV thc hi n

Th.S Bui Tn L c

Trn Vn Tri Nguyn Minh Tri

Hng dn phu

PGS.TS inh in

vii

MUC LUC
LI CM N......................................................................................................................iv
CNG CHI TIT.......................................................................................................v
MUC LUC..........................................................................................................................viii
DANH MUC HNH.............................................................................................................xi
DANH MUC BNG..........................................................................................................xiii
CC T VIT TT..........................................................................................................xiv
Chng 1 : TNG QUAN...................................................................................................1
1.1. Bi cnh v nhu cu thc t........................................................................................1
1.2. Mc tiu.......................................................................................................................3
1.3. Cc ti lin quan.....................................................................................................2
1.4. N i dung kha lu n.....................................................................................................4
Chng 2 : CC KY THUT C BN TRN ANDROID.............................................6
2.1. S lc v Android......................................................................................................6
2.1.1. Tng quan.............................................................................................................6
2.1.2. Cc phin bn Android.........................................................................................7
2.1.3. Kin truc v thit k.............................................................................................8
2.1.4. My o Dalvik....................................................................................................10
2.1.5. Android software development kit (SDK)..........................................................11
2.2. Native development kit (NDK).................................................................................12
2.2.1. Gii thiu chung.................................................................................................12
2.2.2. Cc h tr ca NDK...........................................................................................13
2.2.3. S dng NDK.....................................................................................................13
2.2.4. Ni dung ca b NDK........................................................................................14
2.2.5. Gii thiu v JNI Java native interface...........................................................15
Chng 3 : NHN DNG KY T QUANG HC.........................................................18
3.1. Gii thi u chung........................................................................................................18
3.1.1. S lc v nhn dang ky t quang hc OCR..................................................18
3.1.2. Cc phng php p dng OCR trong lu n vn.................................................18

viii

3.1.3. So snh cc th vi n / cng c nhn dang ky t quang hc...............................20


3.1.4. Kt lun..............................................................................................................21
3.2. Gii thi u v b nh n dang ky t quang hc Tesseract.............................................22
3.2.1. Lich s................................................................................................................22
3.2.2. Kin truc hoat ng............................................................................................24
3.2.3. Ci t v s dng th vi n Tesseract trn Android...........................................25
3.2.4. Hun luy n d liu trn Tesseract......................................................................29
3.2.5. Qu trinh hun luyn ngn ng v font mi......................................................30
Chng 4 : TRA T IN ANH-VIT...........................................................................35
4.1. Tng quan..................................................................................................................35
4.2. Khi phc t gc (Stemming)...................................................................................37
4.3. Tim t gn ung........................................................................................................40
4.3.1. Khong cch Levenstein....................................................................................40
4.3.2. Thay th cc ky t gn ung...............................................................................42
4.4. Cu truc d li u t in.............................................................................................43
4.4.1. T chc cc mc t c cung kich thc c inh................................................44
4.4.2. T chc cc mc t c kich thc bin ng.....................................................44
4.4.3. T chc d li u t in tra cu nhanh................................................................45
Chng 5 : CAI T VA THC NGHIM NG DUNG............................................50
5.1. V khung v cc control trn mn hinh camera........................................................50
5.2. Thu nhn nh t camera in thoai...........................................................................52
5.3. Hin thi ting Vi t v inh dang ch trn mn hinh..................................................55
5.3.1. Hin thi ting Vi t trn Android.........................................................................56
5.3.2. inh dang ng nghia t in..............................................................................57
5.4. Ma ha d li u t in...............................................................................................61
5.5. Lu tr cu hinh chc nng ca ng dng................................................................63
5.6. K thu t pht m t ting Anh dung API trn Android.............................................66
5.7. Mi trng pht trin ng dng................................................................................68
5.8. Hng dn ci t v s dng...................................................................................69
5.8.1. Ci t chng trinh...........................................................................................69
5.8.2. Hng dn s dng............................................................................................70
5.9. Kt qu th nghi m...................................................................................................74

ix

5.9.1. Th nghim khi nhn dang ky t......................................................................74


5.9.2. Th nghim khi x ly ngn ng.......................................................................76
5.9.3. nh gi kt qu.................................................................................................78
5.9.4. So snh ng dng vi cc ng dng hi n c trn thi trng..............................78
TNG KT.........................................................................................................................81

MT S KT QU T C........................................................................81

HN CH...............................................................................................................82

HNG PHT TRIN.........................................................................................82

TAI LIU THAM KHO..............................................................................................84

DANH MUC HNH


Hinh 1.1 S khi tng qut ca chng trinh.......................................................4
Hinh 2.1 in thoai dung h iu hnh Android.......................................................6
Hinh 2.2 Kin truc tng th ca Android [1].............................................................8
Hinh 2.3 C ch hoat ng ca my o Dalvik v Java...........................................11
Hinh 2.4 Minh ha trinh gi lp in thoai Android................................................12
Hinh 2.5 JNI ng vai tr trung gian trong vic giao tip gia C/C++ v Java.......15
Hinh 2.6 Ni dung tp tin cu hinh bin dich trong JNI..........................................16
Hinh 3.1 Qu trinh thc hin OCR..........................................................................18
Hinh 3.2 S khi nhn din ky t quang hc trong chng trinh.......................20
Hinh 3.3 Kin truc tng th ca Tesseract [2]..........................................................25
Hinh 3.4 Minh ha cu truc ca project tesseract-android-tools..............................26
Hinh 3.5 Minh ha mt phn cc chi thi bin dich ma ngun th vin C/C++
trong tp tin Android.mk..................................................................................27
Hinh 3.6 Qu trinh s dng NDK bin dich th vin C/C++ trn Android.........28
Hinh 3.7 Qu trinh bin dich ma ngun th vin Tesseract thnh cng trn Android
......................................................................................................................... 28
Hinh 3.8 Cu truc tp tin dang hp..........................................................................32
Hinh 3.9 Qu trinh hun luyn d liu trn Tesseract..............................................34
Hinh 4.1 S thu t ton tra t in v x ly ngn ng t nhin...........................36
Hinh 4.2 S thu t ton khi phc t gc............................................................39
Hinh 4.3 S t chc tp tin t in.....................................................................48
Hinh 5.1 Giao di n mn hinh camera......................................................................50
Hinh 5.2 Minh ha gia inh font Droid...................................................................56
Hinh 5.3 Hinh inh dang vn bn hin thi theo cc kiu phong cch.....................58
Hinh 5.4 Hinh inh dang lin kt.............................................................................60
Hinh 5.5 ScreenPreference......................................................................................65
Hinh 5.6 ListPreference...........................................................................................65

xi

Hinh 5.7 Biu tng chng trinh sau khi ci t hon tt......................................70
Hinh 5.8 Mn hinh chng trinh khi khi ng......................................................70
Hinh 5.9 Mn hinh hin thi nghia ca t.................................................................71
Hinh 5.10 Mn hinh vi h thng menu setting bn di.....................................72
Hinh 5.11 Mn hinh tra t in theo cch thng thng..........................................73
Hinh 5.12 Mn hinh thit lp setting.......................................................................73
Hinh 5.13 Kt qu trc khi tra t...........................................................................76
Hinh 5.14 Mn hinh hin thi nghia ca t sau khi x ly t gc...............................77
Hinh 5.15 Mn hinh hin thi danh sch t gn ung vi kt qu nhn dang...........77

xii

DANH MUC BNG


Bng 3.1 So snh phn mm thng mai v Tesseract............................................23
Bng 3.2 chinh xc ca Tesseract trn mt s ngn ng....................................23
Bng 4.1 Minh ha ma trn kt qu sau khi tinh khong cch Levenstein..............41
Bng 4.2 Bng m t cc trng d li u.................................................................43
Bng 5.1 Kt qu th nghim b nhn dang trong chng trinh.............................74
Bng 5.2 Mt s kt qu nhn din sai....................................................................75
Bng 5.3 nh gi tc thc thi ca chng trinh................................................78
Bng 5.4 Bng so snh ng dng vi Camera Dictionary.......................................79
Bng 5.5 Cc tinh nng chinh trong chng trinh...................................................81

xiii

CC T VIT TT
OCR

Optical Character Recognition

JNI

Java Native Interface

SDK

Software Development Kit

NDK

Native Development Kit

API

Application Programming Interface

DES

Data Encryption Standard

TTS

Text To Speech

CMD

Command Line

UNLVUniversity of Nevada-Las Vegas

xiv

Chng 1 : TNG QUAN


1.1. Bi canh va nhu cu thc t
Trong thi bui cng ngh thng tin pht trin nh vu bao, cc thit bi i n t
ngy cng pht trin vt b c in hinh l cc dng my tinh, laptop, i n thoai di
ng a tr nn ph bin, ngy cng manh m v nho gn phc v cho nhu cu trao
i thng tin lin lac gia mi ngi. Trong i n thoai l m t v t khng th
thiu trong i sng con ngi v ngy cng c s pht trin vt b c. T dn
n vi c hinh thnh cc dng i n thoai thng minh - smartphone c tich hp
nhiu chc nng v kich thc cng ngy cng nho gn. p ng xu th pht trin
, cc dng i n thoai thng minh a ra i vi cu hinh manh m v nhiu tinh
nng hu ich ang dn chim hu thi trng.
Bn canh , nhu cu v t in phc v cho mi ngi trong vi c hc
t p, giao tip cung tr nn cn thit. Chinh vi th nhiu chng trinh t in
ngn ng a c ra i trn cc nn tng ca thit bi di ng phc v cho nhu
cu . Tuy nhin cc chng trinh t in phn ln yu cu ngi s dng phi
nh p t trc tip trn bn phim i n thoai sau mi thc hi n vi c tra t. i
vi cc ngn ng ky t Latinh thi vi c nh p v tra t s d dng hn nhng i vi
cc ngn ng khc nh ting Trung ho c ting Nga chng han thi vi c s dng t
in bng cch nh p t vo v tra s kh khn hn cho cho ngi s dng i hoi
ngi dung phi bit ro mu t ca ngn ng nhng i vi nhng ngi cha
bit ho c chi mi lm quen vi cc ngn ng ny thi vi c nh p t s rt kh khn.
Thi d nh trong trng hp m t ngi i du lich qua t nc khc nhng khng
bit ho c bit rt it v ngn ng thi s kh khn khi nh p t tra nghia. V y
nn nu pht trin m t ng t in nhng khng bt bu c ngi dung phi nh p t
vo m cho phep ngi dung c th tra t m t cch gin tip thng qua camera ca
thit bi i n thoai thi ro rng s ti n li hn rt nhiu. Vi phn ln cc dng in
thoai thng minh hi n nay u c trang bi camera nn vi c pht trin m t ng
dng tra t qua camera s tr nn cn thit hn v phu hp vi tinh hinh thc t.

Hi n nay cc dng in thoai thng minh chay trn nhiu nn tng khc nhau.
Trong ni ln hai nn tng chinh ang chim linh thi trng di ng hi n nay l
iOS ca Apple v Android ca Google. H iu hnh di n g Android ca Google
ang canh tranh vi iOS v c s lng thit bi ln hn vi nhiu hang sn xut v
mu ma a dang.
Gn nhu cu thc t trong vi c tra t in s dng camera trn i n thoai
cung vi nn tng Android ang c s dng ph bin hi n nay nn nhm chung
em quyt tm xy dng chng trinh tra t in Anh Vit trc tip qua camera trn
i n thoai Android.

1.2. Cac tai lin quan


Lun vn tp trung vo pht trin ng dng phc v tra t in Anh Vit qua
camera trn in thoai Android. ti ny cung da trn hng nghin cu v tra
t in qua camera trn in thoai [2] Lun vn thac si ca anh Nguyn Hong
Giang. Ngoi ra trn thi trng cung xut hin nhiu phn mm c chc nng nhn
din t qua camera in thoai. Tiu biu cho mi trng Android l ng dng
CamDictionary.
ti Xy dng ng dng tra t in bng camera trn in thoai di ng
Error: Reference source not found l ti c nhiu net tng ng vi
ti m chung em ang thc hin. C hai tp trung gii quyt vn l
cch thc tra t mi bng cch dung camera nhn din t v sau tra t
in. C hai ti u s dng b Tesseract lm thnh phn nhn din ky t
quang hc chinh ca chng trinh. Chi tit v Tesseract s c gii thiu
chong chng 3 ca bo co. im khc bit duy nht ca 2 ti l nn
tng pht trin. ti u tin s dng tra t in trn h iu hnh Symbian
ca Nokia cn ti trong lun vn s dng Android ca Google lm nn
tng chinh.
ng dng CamDictionary: y l mt ng dng trn Android dung dich
t qua camera in thoai, c th xem l chng trinh gn tng ng vi

lun vn nht. ng dng do cng ty Insig1 pht trin. y l mt cng ty ca


M chuyn cung cp cc ng dng thc hin vic nhn din ky t quang hc
trn nhiu mi trng nh: my quet, in thoai, my tinh bng ng dng
CamDictionary s dng camera ca in thoai nhn din ky t hoc mt
oan cu sau dung tinh nng Google Translate qua mi trng mang
dich t hoc oan cu . B nhn din ngn ng ca chng trinh kh
chinh xc nhng vic tra cu v dich t ph thuc vo kt ni mang v
chinh xc ca b d liu t in Google. Chi tit kt qu so snh tinh nng
ca lun n v chng trinh CamDictionary s c gii thiu trong phn
tng kt ca bo co.

1.3. Muc tiu


Mc tiu ca ti l xy dng m t ng dng trn i n thoai di ng s
dng camera quet hinh nh v s dng b nh n di n ky t quang hc (Optional
Character Recognition OCR) rut trich ra cc t trong hinh nh. T lm d
li u u vo cho vi c tra t.
xy dng c ng dng tra t in qua camera, lu n vn s t p trung gii
quyt cc vn sau:
Tim hiu v mi trng l p trinh trn nn tng Android.
Tim hiu su vi c l p trinh thu nh n nh t camera ca i n thoai.
Tim hiu v bi ton nh n dang ky t quang hc v cch s dng th vi n
Tesseract OCR ng thi tim hiu cch thc bin dich ma ngun th vi n
Tesseract chay trn mi trng Android.
Nghin cu xy dng cu truc d li u thc hi n vi c tra t.
Tim hiu cc thu t ton x ly ngn ng tng kh nng tra t chinh xc
cho ng dng nh khi phc t gc, tra t gn ung v p dng cc thu t ton
vo trong chng trinh.
Xy dng ng dng hon chinh vi y cc chc nng a ra ng thi
ci tin thm cc tinh nng mi trong chng trinh.
1

http://www.intsig.com/en/index.html

Chng trinh c chia lm 3 phn chinh l thu nhn nh ca vn bn t


camera in thoai, nhn dang ky t quang hc, phn tra t v x ly ngn ng.

Hnh 1.1 S khi tng quat cua chng trnh

1.4. N i dung khoa lu n


N i dung ca lu n vn bao gm 6 chng:
Chng 1. M u: Bi cnh v nhu cu thc hi n ti, mc tiu ca ti v
n i dung ca kha lu n.
Chng 2. K thu t l p trinh c bn trn Android: s lc v Android, l p trinh
truy xut camera, s dng cng c NDK bin dich ma ngun trn Android.
Chng 3. Nh n dang ky t quang hc OCR: Gii thi u chung v nhn dang ky
t quang hc, b nh n dang ky t quang hc Tesseract v cch hun luyn d liu.

Chng 4. Tra t in Anh Vit: Cu truc d li u t in, khi phc t gc v tra


t gn ung.
Chng 5. Ci t thc nghi m
: kt qu th nghi m
v nh gi chng trinh.
Kt lu n: Han ch ca lu n vn v hng pht trin trong tng lai.

Chng 2 : CAC KY THUT C BAN TRN ANDROID


2.1. S lc v Android
2.1.1. Tng quan
Android l h iu hnh m da trn nn tng Linux dung cho cc thit bi di
ng bao gm in thoai thng minh, my tinh bng, my tinh xch tay. c pht
trin ban u tai cng ty lin hp Android sau cng ty ny c Google mua lai
vo nm 2005 v bin Android thnh mt h iu hnh m trn cc thit bi di ng.
Android chinh thc ra mt vo ngy 5/11/2007 cung vi s ra i ca lin
minh thit bi cm tay m OHA (Open Handset Alliance). Lin minh OHA l mt t
chc bao gm khong hn 78 cng ty vin thng, di ng v phn cng nh
Goolge, Sony Ericsson, Samsung, Nvidia, Qualcomm Mc tiu ca hi ny l
pht trin cc chun m chung cho thit bi di ng trong tng lai. V Android l
sn phm ch lc ca hang. Ma ngun ca Android l ma ngun m v c cng
b di dang giy phep Apache.

Hnh 2.2 in thoi dung h iu hanh Android

2.1.2. Cc phin bn Android


T luc ra i n nay, Android a tung ra nhiu phin bn khc nhau vi
nhng nng cp v ci tin theo tng phin bn. Sau y l danh sch cc phin bn
Android hin c:
Phin bn 1.5 (Cupkake): phin bn chinh thc du tin ca Android
trn in thoai.
Phin bn 1.6 (Donut).
Phin bn 2.0/2.1 (Eclaire).
Phin bn 2.2 (Froyo).
Phin bn 2.3 (Gingerbread).
Phin bn 3.0 / 3.1 (Honeycomb): phin bn dnh ring cho my tinh
bng (tablet).
Phin bn 4.0 / 4.0.1 / 4.0.3 (Icecream sandwich): y l phin bn
Android mi nht hin nay v c dung trn cc in thoai thng minh
v c my tinh bng.

2.1.3. Kin trc v thit k

Hnh 2.3 Kin trc tng th cua Android [1]


Nhin vo kin truc ca Android thi h iu hnh Android c chia thnh cc
tng nh trong hinh bao gm: Applications, Application Framework, Libraries,
Android Runtime, Linux Kernel. Trong 2 tng Applications v Application
Framework c vit bng ngn ng Java. Cn cc tng t Libraries n Linux
Kernel c vit bng ngn ng C/C++ hay cn gi l ma gc - native code.
Tng Applications: y l tng cao nht trong h iu hnh Android.
Tng ny bao gm cc ng dng c vit v ci t sn nh: lich, trinh
duyt web, danh ba, camera Cc ng dng tai tng ny u c vit
bng ngn ng Java.
Tng Application Framework: Bn di tt c cc ng dng l mt
tp hp cc dich v v h thng cho phep cc nh pht trin phn mm
c th gi cc hm h tr sn qua giao din lp trinh ng dng API
(Application Programming Interface).

Tp hp cc mn hinh (Views) m rng dung xy dng nn

giao din chng trinh nh nut bm, danh sch, hp thoai, text box,
cc s kin
o B cung cp ni dung (Content Provider): Cung cp kh nng
truy xut v chia s d liu gia cc ng dng.
o Qun ly ti nguyn (Resouce Manager): Qun ly cc loai tp
tin khng phi l ma ngun. Cung cp kh nng truy cp n cc ti
nguyn khc trong ng dng nh cc chui, tp tin ha, cc tp
tin inh dang giao di n (layout).
o Qun ly thng bo (Notification Manager): Qun ly v hin thi
cc thng bo thanh trang thi (Status Bar).
o Qun ly hoat ng (Activity manager): Qun ly vng i v
chu trinh hoat ng ca cc ng dng.
Tng Libraries: Android c h thng cc th vin C/C++ c s
dng nhiu trong cc thnh phn khc nhau ca h iu hnh. Mt s cc
th vin C/C++ chinh trong Android:
o System C Library: mt BSD (Berkely Software Distribution)
c tha k t cc th vin chun C v c tinh chinh cho cc
thit bi s dng trn nn Linux.
o Media Library: th vin h tr cho vic ghi m, chi cc inh
dang nhac, phim v hin thi cc nh bao gm cc inh dang sau:
MPEG4, H.264, MP3, AAC, ARM, JPG, PNG
o Surface Manager: Qun ly truy cp vo h thng hin thi.
o Live Webcore: Cng c trinh duyt web.
o SGL: Cc hm c bn v ha 2 chiu.
o 3D Library: ha 3 chiu.
o Freetype: Biu din cc font v vect bitmap.
o SQLite: C s d liu.
Tng Android Runtime: Bao gm mt tp cc th vi n loi Java v
my o Dalvik. My o Dalvik thc thi cc t p tin inh dang dex. Mi
ng dng c chay trn mt tin trinh ring ca my o Dalvik. Trn
cung 1 thit bi c th chay nhiu my o Dalvik khc nhau mt cch hiu
qu.

Tng Linux Kernel: y l tng thp nht trong h iu hnh


Android, c xy dng trn nhn ca Linux 2.6 cha cc trinh qun ly
thit bi nh keypad, wifi, m thanh, qun ly in nng v cc dich v
ca h thng nh: an ninh, qun ly b nh, qun ly tin trinh, kt ni
mang. Tng ny ng vai tr l tng trung gian lin lac gia phn cng
v ngn xp phn mm cc tng trn.
2.1.4. My o Dalvik
Dalvik l my o thc hin cc ng dng phn ln vit bng Java trn
Android di dinh dang l tp tin (.dex). V c bn c th nhn thy my o Dalvik
c phn ging vi my o Java trn Desktop, tuy nhin c phn khc l khi ta vit
cc ng dng trn Java thi ma ngun s c chuyn thnh ma bytecode. Tai y,
mt cng c c sn trn Android l dx s chuyn dang ma bytecode ny thnh
dang tp tin .dex (vit tt l Dalvik Excutable) v c thc thi trn my o Dalvik
chay cc ng dng Android.

10

Hnh 2.4 C ch hot ng cua may o Dalvik va Java


2.1.5. Android software development kit (SDK)
B pht trin ng dng cho Android hay cn gi l Android SDK cung cp
cho cc nh pht trin phn mm c th lp trinh, g li v kim th ng dng c
pht trin trn Android. B SDK bao gm:
Th vin lp trinh Android (Android API): y l phn ct loi ca b
pht trin Android, t cc th vin lp trinh Android API, Google a xy
dng nn cc ng dng c sn.
Cng c pht trin: cung cp sn cho cc nh pht trin cc cng c
lp trinh, bin dich, sa li ma ngun trong ng dng.
Ti liu: y l phn hng dn s dng cc th vin, lp / hm c
sn trong mi trng lp trinh Android. Ngoi ra cn gii thich v c ch
hoat ng ca cc ng dng trong Android.

11

ng dng mu: B Android SDK cn cung cp cc oan ma ngun


ca chng trinh c sn.
Trinh gi lp Android: cung cp s thun tin cho ngi pht trin,
b Android SDK a cung cp cho ngi dung sn trinh gi lp Android
m phong mi trng lm vic y nh trn thit bi tht thun tin cho
ngi pht trin c th chay hoc sa li cc ng dng trn thit gi lp
ny m khng cn phi c thit bi tht.

Hnh 2.5 Minh ha trnh gi lp in thoi Android

2.2. Native development kit (NDK)


2.2.1. Gii thiu chung
Khi vit mt ng dng Android tng trn bng ngn ng Java m ta c nhu
cu gi lai cc hm hoc th vin tng bn di (thng l cc oan ma tng
di c vit bng C/C++). Java c th hiu v truy xut c cc oan ma
C/C++ thi ta cn mt giao din chung gia 2 ngn ng. Giao din chung c
gi l Java Native Interface JNI.

12

Native Development Kit NDK l mt b cng c i kem vi Android SDK


giup cho cc nh pht trin c th vit hoc nhung cc oan ma ngun bng C/C++
bn trong chng trinh. Cc ng dng Android hoat ng trn my o Dalvik.
Chinh nh NDK m cc ng dng c th gi c cc oan ma gc native code
c s dng trong chng trinh.
2.2.2. Cc h tr ca NDK
B cng c NDK cung cp cc h tr sau:
Mt tp hp cc cng c v t p tin pht sinh ra cc th vin ma t
C/C++.
Cch thc nhung cc oan ma pht sinh t C/C++ vo trong tp tin
ng gi ng dng (.apk) chay c trn cc thit bi Android.
Cung cp mt tp cc header v th vin s c h tr tt c cc
phin bn Android t 1.5 tr i. T phin bn 2.3 c h tr thm vit
Native Activity.
Cc ti liu, ma ngun mu v hng dn.
2.2.3. S dng NDK
Khng phi luc no s dng NDK cung c li cho chng trinh. Vi s dng
ma gc (native code) trong chng trinh khng lm tng hiu nng thc thi m chi
lm tng thm s phc tap cho ng dng. Chi s dng ma gc trong trng hp cn
thit lm gim s phc tap cho chng trinh.
Android Framework cung cp 2 cch s dng native code trong chng
trinh:
Vit ng dng s dng Android Framework v s dng JNI truy
cp cc hm API c cung cp trong b cng c Android NDK. u
im ca k thut ny l chung ta c th tn dng cc li ich ca Android
Framework m vn s dng c ma gc khi cn thit.
Vit mt native activity hin thc ci t chu trinh ca ng dng
bng ma gc. B cng c Android SDK s cung cp lp NativeActivity

13

l lp tin ich hin thc ci t vng i ca ng dng thng qua cc


hm (OnCreate, OnPause).
2.2.4. Ni dung ca b NDK
Bao gm cc cng c v th mc sau:
Cng cu phat trin: Bao gm tp hp cc cng c pht trin (trinh bin
dich, trinh lin kt linker) pht sinh ra ma nhi phn cho b vi x ly
ARM chay trn cc nn tng Linux, OS X v Windows (s dng kem vi
cng c Cygwin). Cc cng c pht trin ny cn cung cp mt tp hp cc
h thng header dung cho cc hm API gc n inh v c m bo l s
h tr trong tt c cc phin bn sau ny ca nn tng Android:
Libc (th vin C) header.
Libm (th vin ton hc) header.
Giao din JNI header.
Libz (nen v gii nen) header.
Liblog (dung cho vic ghi log trn Android) header.
OpenGL ES 1.1 v OpenGL ES 2.0 (th vin ha ba chiu) header.
Libjnigraphics (truy cp vung nh m trn cc pixel) header.
Cc header h tr cho C++.
OpenSL ES (th vin m thanh gc).
Cc API h tr ng dng gc trn Android.
Ngoi ra NDK cn cung cp cho chung ta mt h thng bin dich ma
ngun hiu qu m khng cn phi c s iu khin chi tit cc cng c /
nn tng / vi x ly / ABI. Ngi dung s chi phi tao ra cc tp tin nho
chi thi cho vic bin dich ma ngun s c dung trong chng trinh. NDK
s da vo cc tp tin bin dich ny bin dich ma ngun tao ra th
vi n lin kt ng v t trc tip th vin ny trong d n.
B tai liu: B NDK cn cha tp hp nhiu ti liu m t cc tinh
nng ca NDK, cch thc s dng, vit tp tin bin dich ma ngun, cch

14

thc tao th vin lin kt ng Ngi dung c th tham kho thm trong
th mc <ndk>/docs.
Cac ng dung mu: Cung cp cc ng dng c vit sn cho ngi
dung tham kho.
2.2.5. Gii thiu v JNI Java native interface
Nh chung ta a bit, mun chay c ma C/C++ trn Android thi chung ta
cn phi s dng JNI hoc vit mt NativeActivity. V cch s dng JNI trn
Android s ph bin hn do tng thich vi nhiu loai thit bi. Trong lun vn,
chung em cung s dng cch ny chay ma ngun C/C++ trn Android.

Hnh 2.6 JNI ong vai tr trung gian trong vic giao tip gia C/C++ va Java
Sau y l nguyn tc hoat ng v cch thc vit mt JNI trong chng trinh
[4]:
Cc phng thc c nh du l JNI s thm t kha native u
mi hm.
Cc hm ny s c ci t bng C/C++ v c t trong th mc
JNI ca project. Cc hm C/C++ c NDK bin dich thnh tp tin th
vin lin kt ng .so.
load th vin lin kt ng ny thi trong chng trinh Java s gi
phng thc System.LoadLibrary().
Sau , khi no c li gi hm trong chng trinh thi my o s tim
kim cc hm ny trong th vin lin kt ng v thc thi cc phng
thc c ci t bng C/C++.

15

Vi du v chng trnh hello-jni trong Android:


Ta vit tp tin hello-jni.c ci t hm tr v 1 chui (trong tn
phng thc c t theo th t sau: tn package_tn lp_tn phng
thc.
Jstring
Java_com_example_hellojni_HelloJni_stringFromJNI( JNIEn
v* env, jobject thiz )
{
return (*env)->NewStringUTF(env, "Hello from JNI!");
}
Sau tai th mc JNI ta tao tp tin Android.mk l tp tin chi thi bin
dich v cu hinh ma ngun C/C++ trong Android.

Hnh 2.7 Ni dung tp tin cu hnh bin dch trong JNI


Trong tp tin cu hinh Android.mk ta thy 2 dng quan trng l
LOCAL_MODULE v LOCAL_SRC_FILES. Dng u l chi thi tn
ca th vin s c tao ra khi gi lnh ca NDK v dng sau l chi thi
cc tp tin ma ngun s c bin dich thnh th vin trong Android.
T Command Line gi lnh ndk-build bin dich ra th vin lin kt
ng hello-jni.so.
Tai chng trinh chinh Java, ta load th vin ln v gi phng thc
native ca hm trong th vin:
static {
System.loadLibrary("hello-jni");
}
Public native String stringFromJNI();

16

Cui cung s dng phng thc bng cch gi hm binh thng.

17

Chng 3 : NHN DANG KY T QUANG HOC


3.1. Gii thi u chung
3.1.1. S lc v nhn dng k t quang hc OCR
Nhn dang ky t quang hc (tn ting anh l Optical Character Recognition
OCR) l mt qu trinh thc hin vic chuyn i t dang hinh nh ca ch vit in
hoc cc ky hiu sang cc dang vn bn ti liu hoc thng tin c th chinh sa trn
my tinh. u vo ca qu trinh ny l tp tin hinh nh v u ra s l cc tp tin
vn bn cha ni dung l cc ch vit c trong hinh nh . Nhn dang ky t quang
hc c hinh thnh t cc linh vc nghin cu nhn dang mu, tri tu nhn tao v
thi gic my tinh. Ngy nay k thut nhn dang ky t quang hc a c s dng
rng rai v ng dng nhiu trong thc t song song vi vic nghin cu v ly thuyt
ci tin kt qu nhn dang.
Thng thng, cc h thng nhn dang ky t quang hc c s dng di
dang cc phn mm trong my tinh hoc tich hp trong my in, my quet thc
hin vic nhn dang ky t. Vi d thng thy nht l quet cc hinh nh vn bn
thnh cc vn bn ti liu lu trn my tinh.

Hnh 3.8 Qua trnh thc hin OCR


3.1.2. Cc phng php p dng OCR trong lun vn
Lun vn tp trung vo vic s dng k thut nhn dang ky t quang hc p
dng trn in thoai Android thc hin vic tra t in qua camera ca in

18

thoai. Sau y l cc phng php c th p dng k thut nhn dang ky t quang


hc:
o Nghin cu va t xy dng m t b nhn dng k t quang hc:
y l cch kh khn khi thc hin vi hin nay trn th gii a c nhiu
hng nghin cu v linh vc ny v cho ra i nhiu phng php nhn
dang ky t quang hc. T vit lai b nhn dang ky t quang hc s tn
kh nhiu thi gian m hiu qu s khng c cao m lun vn ny ch
yu tp trung vo vic s dng nhn dang ky t quang hc thc hin
tra t nn cch ny s khng kh thi v bn thn vic nghin cu cc k
thut nhn dang ky t quang hc a l mt ti ln nn chung em s
khng chn phng php ny thc hin nhn dang ky t quang hc.
o S dung cac b nhn dng k t quang hc trc tip trn web
thng qua mi trng mng: in thoai s gi hinh nh ln my ch
web my ch s trc tip x ly p dng cc thut ton nhn dang ky
t quang hc c ci t sn x ly, phn tich bc nh v gi tr kt
qu a c nhn dang v cho in thoai. Cch ny c u im l d
thc hin v chinh xc c th cao tuy nhin khi s dng chng trinh
i hoi ngi s dng phi ci t mang in thoai hoc wifi trong my
kt ni mang internet cho vic truyn v nhn d liu t my ch trn
web. Cha k n vic x ly v ch kt qu t my ch trn mang s kh
lu gy bt tin cho ngi s dng chng trinh. Nu in thoai khng
c kt ni mang thi ngi dung s khng th s dng c tinh nng
nhn din t.
o S dung cac b th vin nhn dng k t quang hc co sn: So
vi 2 cch c nu ra trn thi cch ny c u im l thc hin khng
qu kh khn, chi cn ci t v bin dich chay trn mi trng
Android va khc phc c nhc im l ng dng khng ph thuc
vo mi trng mang, c th chay c lp, tit kim nhiu thi gian vi
vic x ly trn khi nhn dang c thc hin hon ton trc tip trn
in thoai.

19

Hnh 3.9 S khi nhn din k t quang hc trong chng trnh


3.1.3. So snh cc th vin / cng c nhn dng k t quang hc
Hin nay trn th gii a c kh nhiu b th vin nhn dang ky t quang hc
vi chinh xc kh cao. S dng mt trong cc th vin s giup chung ta tit
ki m kh nhiu cng sc. Sau y l mt s b v phn mm nhn dang ky t
quang hc min phi c s dng rng rai hin nay:
Tesseract OCR2: l b nhn dang ky t quang hc thng mai ban u
c pht trin tai cng ty HP (Hewlett-Packard) trong khong 1985
1995 v c gii thng top 3 phn mm nhn dang ky t quang hc
chinh xc nht trong hi nghi thng nin ca UNLV (University of
Nevada-Las Vegas). Sau b nhn dang ny c chuyn thnh ma
ngun m trn Google v tip tc c pht trin cho n ngy nay vi
2

http://code.google.com/p/tesseract-ocr/

20

s ng gp ca nhiu lp trinh vin chuyn nghip. Trng b phn ca


d n hin nay l Ray Smith.
GOCR3: L mt chng trinh nhn dang ky t quang hc c pht
trin di dang giy phep cng cng GNU v c bt u bi Joerg
Schulenberg vo nm 2000.
FreeOCR4: c xem l mt trong cc phn mm nhn dang ky t
quang hc chinh xc nht vi s dng b engine Tesseract ca HP. Ngoi
ra, FreeOCR cn cung cp dich v nhn dang ky t quang hc trc tuyn
trn web.
JavaOCR5: L phn mm nhn dang ky t quang hc c vit hon
ton ton bng th vin Java cho vic x ly nh v nhn dang ky t. u
im ca chng trinh ny l chim it ti nguyn b nh, d thc hin
trn cc mi trng di ng han ch v b nh v chi s dng c ngn
ng Java.
3.1.4. Kt lun
Trong cc th vin nhn dang ky t quang hc trn thi b nhn dang Tesseract
OCR ni tri nht vi cc u im sau:
C lich s pht trin lu di v mang chinh xc cao ngay t khi mi
ra mt.
Kh nng m rng v tuy bin cao ng thi c Google ti tr v
ng o cc nh pht trin tham gia ng gp cho Tesseract.
Phin bn c cp nht thng xuyn, h tr ngy cng nhiu ngn
ng, c kh nng hun luyn trn cc ngn ng mi v nhiu loai font
ch khc nhau.
Mt s phn mm OCR hin nay u s dng b nhn dang ny cho
vic nhn dang ky t nn Tesseract a tr nn ph bin hn, ng thi
kh nng h tr trn nhiu mi trng, nn tng khc nhau t my tinh
cho n cc thit bi di ng.
3

http://jocr.sourceforge.net/
http://www.free-ocr.com/
5
http://sourceforge.net/projects/javaocr/
4

21

Chinh vi cc u im nu trn m trong lun vn ny nhm chung em s s


dng Tesseract thc hin qu trinh nhn dang ky t trong chng trinh.

3.2. Gii thi u v b nh n dang ky t quang hoc Tesseract


3.2.1. Lich s
Tesseract [5] l mt phn mm ma ngun m v ban u n c nghin cu
v pht trin tai hang Hewlett Packet (HP) trong khong t nm 1984 n 1994.
Vo nm 1995, Tesseract nm trong nhm ba b nhn dang OCR ng u v
chinh xc khi tham gia trong hi nghi thng nin ca t chc UNLV.
Luc mi khi ng thi Tesseract l mt d n nghin cu tin si tai phng thi
nghim HP Bristol v a c tich hp vo trong cc dng my quet dang phng
ca hang di dang cc add-on phn cng hoc phn mm. Nhng thc t d n
ny a tht bai ngay t trong trng nc vi n chi lm vic hiu qu trn cc ti liu
in c cht lng tt.
Sau , d n ny cung vi s cng tc ca b phn my quet HP bang
Colorado a at c mt bc tin quan trng v chun xc khi nhn dang v
vt ln nhiu b nhn dang OCR thi nhng d n a khng th tr thnh sn
phm hon chinh vi cng knh v phc tap. Sau , d n c a v phng thi
nghi m ca HP nghin cu v cch thc nen v ti u ma ngun. D n tp
trung ci thin hiu nng lm vic ca Tesseract da trn chinh xc a c. D n
ny c hon tt vo cui nm 1994 v sau vo nm 1995 b Tesseract c
gi i tham d hi nghi UNLV thng nin v chinh xc ca OCR, vt tri hn
hn so vi cc phn mm OCR luc by gi. Tuy nhin, Tesseract a khng th tr
thnh m t sn phm thng mai hon chinh c v vo nm 2005, HP a chuyn
Tesseract sang ma ngun m v c hang Google ti tr. Tesseract cho n nay
vn c nhiu nh pht trin cng tc v tip tc hon thin. Phin bn mi nht
ca b nhn dang Tesseract l phin bn 3.0.1.
Bng 3.1 So sanh phn mm thng mi va Tesseract
Phn mm thng mi

B nhn dng Tesseract

22

H tr hn 100 ngn ng

H tr trn 40 ngn ng v ang tng dn

C giao din ha

Khng h tr giao din ha (dung


Command Line go lnh)

Hu ht chi h tr trn nn tng

H tr trn Windows, Linux, Mac OS

Windows
chinh xc cao mi y

chinh xc cao t nm 1995

Chi phi kh cao 130$ - 500 $

Hon ton min phi (ma ngun m)

Vi Tesseract hin nay l b th vin ma ngun m hon ton min phi nn trn
th gii a c nhiu phn mm nhn dang ky t quang hc ra i da trn b
Tesseract vi giao din v cc tinh nng d s dng hn so vi giao din n gin
ca Tesseract ban u nh: VietOCR6 cho nhn dang ting Vit, Tessenet27 b nhn
din Tesseract trn nn .Net ca Microsoft, giao din Java (Java GUI frontend) cho
Tesseract
Bng 3.2 chinh xac cua Tesseract trn mt s ngn ng
Ngn ng

Tng s k t

Tng s t

Li k t

(triu)

(triu)

(%)

Li t (%)

Ting Anh

39

0.5

3.72

Ting Nga

213

26

0.75

5.78

Ting Hoa gin

0.25

khng xc

3.77

khng xc

th

inh

Ting Hindi

1.4

0.33

inh
15.41

69.44

3.2.2. Kin trc hot ng


u tin, b nhn din Tesseract s nhn u vo l nh mu hoc nh mc
xm. nh ny s c chuyn n b phn phn tich ngng thich ng (adaptive
thresholding) cho ra nh nhi phn. Vi trc kia HP cung a pht trin b phn
phn tich b cc trang nn Tesseract khng cn phi c thnh phn v c tha

6
7

http://vietocr.sourceforge.net/usage_vi.html
http://www.pixel-technology.com/freeware/tessnet2/

23

hng t HP. Vi th m Tesseract nhn u vo l mt nh nhi phn vi cc vung a


gic tuy chn a c xc inh.
Ban u, Tesseract c thit k lm vic trn nh nhi phn sau chng
trinh c ci tin c th nhn dang c nh mu v nh mc xm. Chinh vi th
m cn b phn phn tich ngng thich ng chuyn i nh mu / nh mc xm
sang nh nhi phn.
Sau qu trinh nhn dang s c thc hin tun t theo tng bc.
Bc u tin l phn tich cc thnh phn lin thng. Kt qu ca
bc ny s l tao ra cc ng bao quanh cc ky t.
Bc th hai l tim hng v tim t, kt qu ca bc ny cung ging
nh bc trn s tao ra cc vung bao quanh cc hng ch v ky t cha
trong vung vn bn.
Bc tip theo s l nhn dang t. Cng oan nhn dang t s c
x ly qua 2 giai oan. Giai oan u s l nhn dang cc t theo lt.
Cc t thoa yu cu trong giai oan ny s c chuyn sang b phn
loai thich ng [5] (adaptive classifier) lm d liu hun luyn. Chinh
nh m b phn loai thich ng s c kh nng nhn din c chinh
xc hn phn sau ca trang. Sau khi b phn loai thich ng a hc
c cc thng tin c ich t giai oan u khi nhn dang phn trn ca
trang thi giai oan th 2 ca vic nhn dang s c thc hin. Giai oan
ny s quet ht ton b trang, cc t khng c nhn din chinh xc
giai oan u s c nhn din lai ln na. Cui cung b nhn din s
tng hp lai cc thng tin trn v cho ra kt qu nhn din hon chinh.

24

Hnh 3.10 Kin trc tng th cua Tesseract [2]


3.2.3. Ci t v s dng th vin Tesseract trn Android
Th vin ma ngun cua b Tesseract c vit bng C/C++ chun chay trn
cc nn tng Windows v Linux nn c th chay c b th vin trn nn tng
Android ta cn chuyn ma ngun sang h iu hnh ny. C 2 cch thc hin
vic chuyn i ma ngun chay trn Android:
Cch th nht l vit lai ma ngun Tesseract bng th vin lp trinh
Java vi cc ng dng trn Android c vit bng ngn ng ny. u
im ca cch ny l nu thc hin c thi chng trinh s hoat ng
mt cch hiu qu vi ma ngun Java hoat ng ti u trn h iu hnh
ny. Tuy nhin i vi ngi lp trinh m ni thi s kh thc hin c
vi phi vit lai ton b chng trinh t u, tiu tn rt nhiu thi gian
m s khng at hiu qu cao.
Cch th hai l nh vo cch thc s dng JNI trn Android c th
s dng lai cc hm th vin C/C++ trn b Tesseract. Cch ny mang
u im l s d thc hin hn i vi ngi l p trinh, tit kim c
nhiu thi gian, cng sc v chng trinh hoat ng theo ung yu cu
ra. C chng nhc im chi l chay chm hn mt chut so vi cch
trn vi Android phi thc thi ma C/C++ thng qua trung gian l JNI.

25

Da vo cc u nhc im ca tng phng php trn thi trong lun vn ny


nhm chung em a chn cch 2 c th s dng c th vin Tesseract trn nn
tng Android.
Qu trinh chuyn ma ngun Tesseract thc thi c trn Android:
c th chay c ma ngun Tesseract thi u tin ta cn ti v
chng trinh NDK sau gii nen th mc ra v chinh bin mi trng
tro n ng dn th mc NDK va gii nen.
Ti v project tesseract-android-tools8, y l mt cng c ma ngun
m trn Google c vit trong trinh bin dich Eclipse cung cp mt tp
cc hm API v cc tp tin cu hinh bin dich th vin Tesseract v
th vin x ly nh Leptonica.
Trinh bin dich Eclipse dung vit ma ngun ng thi c ci t
sn cc plug-in Android bao gm b cng c Android SDK vi cc phin
bn API thich hp pht trin v bin dich chng trinh trc tip trn
Eclipse.
Sau qu trinh chun bi cc cng c cn thit cho vic bin dich ma ngun
Tesseract trn Android thi ta cn import project tesseract-android-tools vo trong
Eclipse ta c nh sau:

Hnh 3.11 Minh ha cu trc cua project tesseract-android-tools


8

http://code.google.com/p/tesseract-android-tools/

26

Project ny thc cht l 1 project trong Android c s dng JNI v NDK


bin dich v chay ma ngun C/C++ trn Android. Ta y trong th mc JNI l th
mc cha cc ma ngun C/C++ v thc thi qu trinh trong JNI. Tp tin Android.mk
l tp tin chi thi cho vic bin dich cc hm v th vin C/C++. Nhim v ca ta
l cn tinh chinh lai tp tin ny c th bin dich c th vin trn.
Trc khi thc thi lnh ca NDK chay th vin C/C++ thi ta cn chun bi
cc ma ngun ca cc th vin sau:
Ma ngun ca Tesseract phin bn 3.01 (quan trng nht).
Hai th vin x ly nh leptonica-1.68 v libjpeg.
Tao th mc tn external t trong project tesseract-android-tools cha ma
ngun ca 3 th vin ny. M tp tin Android.mk trong tt cc th mc con ca th
mc JNI ta s thy cc chi thi bin dich cc tp tin ma ngun ca cc th vin
ny.

Hnh 3.12 Minh ha mt phn cac ch th bin dch m ngun th vin C/C+
+ trong tp tin Android.mk
Ta c th thc hin qu trinh bin dich 3 th vin trn bng cch go lnh NDK
trn Windows / Linux / Mac OS u thc thi cung qu trinh nh nhau. Trong lun

27

vn ny nhm chung em s dng Command Line (CMD) trn Windows thc thi
qu trinh bin dich ma ngun ca Tesseract.
Trn Command Line ca Windows, tro ng dn hin thi n th mc
project tesseract-android-tools trong Eclipse v go lnh ndk-build thc thi qu
trinh bin dich ma ngun ca cc th vi n.

Hnh 3.13 Qua trnh s dung NDK bin dch th vin C/C++ trn Android
Trong qu trinh bin dich ma ngun s gp nhiu thng bo li nn ta cn sa
lai cc tp tin cu hinh. Vi qu trinh bin dich cung chim kh nhiu thi gian
ca lun n nn chung em khng th nu ra chi tit ht c. Cui cung sau khi qu
trinh bin dich ma ngun hon ton thnh cng, s tao c 3 tp tin l lipjpeg.so,
liplept.so, libtess.so. y l cc tp tin th vin trn Android v t y chung ta c
th gi trc tip cc hm C/C++ trong Tesseract thng qua cc tp tin th vin ny.

Hnh 3.14 Qua trnh bin dch m ngun th vin Tesseract thanh cng trn
Android

28

Sau khi hon tt xong, gi n cc hm c sn trong th vin Tesseract ta


chu y n lp TessBaseAPI trong th mc src ca project. V bn cht, lp ny
ng vai tr l lp thc thi lai cc hm trn Tesseract, cc hm trong lp ny c
ci t bng Java v trong thn cc hm ny gi lai cc hm C/C++ bn di cc tp
tin th vin.
Trong lp TessBaseAPI ta lu y n cc hm chinh sau:
public boolean init(String datapath, String language)
public void setImage(Bitmap bmp)
public String getUTF8Text()

Hm init l hm dung khi tao cc d liu ban u cho Tesseract nhn vo


2 tham s: tham s datapath l chui ng dn chi n tp tin d liu ca b
Tesseract OCR cha trong th nh SD Card ca in thoai, tham s th 2 chi inh
ngn ng s c nhn dang.
Hm SetImage nhn vo hinh nh dang bitmap, hinh nh ny s c nhn
dang sau thng qua hm getUTF8 v kt qu tr v ca hm ny l mt chui c
trong hinh .
3.2.4. Hun luyn d liu trn Tesseract
Tesseract ban u c thit k nhn dang cc t ting Anh trn ngn ng
h Latinh. Sau ny, nh s c gng ca nhiu nh pht trin m cc phin bn ca
Tesseract a c th nhn din cc ngn ng khc ngoi h Latinh nh ting Trung,
ting Nht v tng thich vi cc ky t trong bng ma UTF-8. Vic nhn dang cc
ngn ng mi trn Tesseract c th thc hin c nh vo vic hun luyn d liu.
T phin bn 3.0 tr i, Tesseract a c th h tr thm nhiu dang ngn ng mi
v m rng thm vic hun luyn theo font ch. Bi vi ban u, b Tesseract c
hun luyn nhn din t chinh xc nht trn mt s loai font mc inh, nu s
dng cc font ch khc nhn din thi c th kt qu s khng c chinh xc
cao khi lm vic vi cc loai font c ci t sn trong d liu hun luyn.
thc hin qu trinh hun luyn thi ta phi s dng cng c c sn ca Tesseract.

29

Mc inh trong lun vn ny, s dng cng c Tesseract 3.01 cho vic thc hin
hun luyn ngn ng v font mi.
hun luyn d liu trn Tesseract (hoc ngn ng mi) thi ta cn mt tp
cc tp tin d liu cha trong th mc tessdata, sau kt hp cc tp tin ny thnh
tp tin duy nht. Cc tp tin c trong th mc tessdata c quy tc t tn theo dang:
tn_ngn_ng.tn_tp tin. Vi d cc tp tin cn thit khi thc hin vic hun
luyn ting Anh:

tessdata/eng.config.
tessdata/eng.unicharset: Tp ky t ca ngn ng hun luyn.
tessdata/eng.unicharambigs.
tessdata/eng.inttemp: Danh mc cho tp hp cc ky t.
tessdata/eng.pffmtable: Tp tin dang hp s dng xc inh ky t

c trong tp tin hun luyn.


tessdata/eng.normproto: Nh tp tin pffmtable.
tessdata/eng.punc-dawg.
tessdata/eng.number-dawg.
tessdata/eng.freq-dawg: Danh sch cc t tng qut.
tessdata/eng.word-dawg: Danh sch cc t thng thng.
tessdata/eng.user-word: Danh sch t ca ngi dung (tuy chn c
th c hoc khng).
Bc cui cung s tng hp d liu t bc trn v pht sinh ra tp tin d liu
duy nht c dang:
tessdata/eng.traineddata.
Cc tp tin cn thit cho vic hun luyn d liu s c pht sinh khi ta s
dng cng c c sn qua qu trinh hun luyn.
3.2.5. Qu trnh hun luyn ngn ng v font mi
tri qua qu trinh hun luyn ngn ng hoc loai font mi trn Tesseract ta
cn thc hin thng qua cc giai oan sau:
Pht sinh cc tp tin hinh nh cho vic hun luyn:

30

y l bc u tin nhm xc inh tp ky t s c s dng trong vic


hun luyn. Trc ht ta cn chun bi sn mt tp tin vn bn cha cc
d liu hun luyn (trng hp c th l mt oan vn bn). Vic tao ra
tp tin hun luyn cn theo cc quy tc sau:

Bo m s ln xut hin it nht ca cc ky t trong mu t

khong 5 n 10 ln cho mt ky t.

Nn c nhiu mu cho cc t xut hin thng xuyn, it nht l

20 ln.

Cc d liu hun luyn nn c chia theo kiu font, mi tp

tin hun luyn chi nn cha 1 loai font nhng c th hun luyn
nhiu loai font cho nhiu tp tin. Khng nn kt hp nhiu loai font
trong ring mt tp tin hun luyn.
Sau khi a chun bi mu vn bn dung cho vic hun luyn thi ta cn
pht sinh ra nh t tp tin . Dung cc phn mm chuyn tp tin mu
vn bn sang dang tp tin nh hoc in mu vn bn sau quet thnh tp
tin hinh nh dang .tif vi phn gii l 300dpi. Tp tin cui cung trc
khi thc hin vic hun luyn l tp tin nh dang .tif.
Tao cc tp tin dang hp .box:
Mt dang tp tin Tesseract c th hun luyn da trn cc d liu hinh
nh a c bc u l tp tin dang hp box. Tp tin dang hp l tp tin
vn bn cha 1 day cc ky t tun t t u n cui trong tp tin hinh
nh, mi hng cha thng tin ca 1 ky t, ta v ng bao quanh ky
t trong tp tin nh.
tao ra tp tin dang hp ta s dung cch go lnh (trn Windows l
CMD v Linux l Terminal) sau (yu cu ngi dung phi ci t cng
c Tesseract c th chay c cc lnh ny):
Tesseract [lang].[fontname].exp[num].tif
[fontname].exp[num] batch.nochop makebox

31

[lang].

Sau khi thc hin cu lnh trn thi ta s tao ra c cc tp tin dang
hp .box.

Hnh 3.15 Cu trc tp tin dng hp


Chay cng c Tesseract trn my tinh thc hin vic hun luyn d
liu. Sau khi c tp tin .box thi chung ta cn 1 trinh chinh sa tp tin
dang hp kim tra lai v chinh sa lai cc thng s ca tng ky t cho
khp vi vn bn ban u trong tp tin nh hun luyn. y nhm em
dung phn mm jTextBoxEditor chinh sa trc tip tp tin dang hp.
Sau khi kim tra v chinh sa lai cc ky t cho chinh xc trong tp tin
dang hp thi thc hin lnh tip theo:
Tesseract

[lang].[fontname].exp[num].tif

[fontname].exp[num]

[lang].

nobatch box.train.stderr

Nu thnh cng thi tai giai oan ny, Tesseract s pht sinh ra tp tin .tr
c lng tp ky t ca ngn ng cn hun luyn: Tesseract cn bit
ht cc tp ky t c th xut hin trong d liu. Ta dung lnh sau:
unicharset_extractor *.box

Sau khi thc hin, tp tin unicharset s c tao ra.


Xc inh kiu font trong d liu (t phin bn 3.0.1 tr i):
y l tinh nng mi chi c t phin bn Tesseract 3.0.1 tr i. Vi tinh
nng ny ngi dung c th hun luyn d liu vi nhiu loai font khc
nhau thay vi chi c th dung cc font mc inh sn cc phin bn

32

trc. Ta cn tao tp tin font_properties quy inh thng s cc kiu


font ta a s dng trong cc mu vn bn hun luyn.
Cu truc ca tp tin font_properties l mi hng cha tn 1 loai font hun
luyn v cc c tinh ca font : <tn loi font><in nghing><in
m><bnh thng><in hoa><fraktur> (nh du c thuc tinh bng
bit 1 hoc khng c dung bit 0).
Vi d cu truc tp tin font_properties vi d liu hun luyn l ting
Anh:
Arial 0 0 0 0 0
arialbd 0 1 0 0 0
arialbi 1 1 0 0 0
ariali 1 0 0 0 0
Gom nhm d liu:
Tai giai oan ny thi cc ng net khung ca ky t a c rut trich ra
v chung ta cn gom nhm lai cc d liu ban u tao ra mu th
(prototype). Hinh dang, ng net ca cc ky t s c gom nhm lai
nh vo chng trinh mftraining v cntraining c sn trong cng c
Tesseract:
mftraining

-F

font_properties

-U

unicharset

-O

lang.unicharset *.tr

Vi lnh mftraining s tao ra tp tin d liu: inttemp (cha hinh dang


mu), pffmtable v Microfeat nhng it khi s dng).
Cui cung dung cng c cntraining s tao ra tp tin d liu normproto.
Tao tp tin unicharambigs.
Kt hp cc tp tin lai tao thnh tp tin hun luyn d liu: Cui cung
sau khi a c cc tp tin hun luyn cn thit (inttemp, pffmtable,
normproto, Microfeat) thi ta i tn cc tp tin lai cho ung dang vi tin
t lang. trc tn tp tin vi lang l 3 ky t ai din cho ngn ng hun
luyn theo chun ISO 639-29. Thc hin lnh sau:
9

http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes

33

Combine_tessdata lang.

Kt qu l tao ra tp tin lang.trainedata. Bo tp tin ny vo thc mc


tessdata ca Tesseract thi Tesseract a c th nhn din c ngn ng
hoc font ch mi (theo ly thuyt).

Hnh 3.16 Qua trnh hun luyn d liu trn Tesseract

34

Chng 4 : TRA T IN ANH-VIT


4.1. Tng quan
Trong khi ny d li u u vo l kt qu ca khi x ly d li u ky t quang
hc t OCR v cho u ra l kt qu tra cu t in. (S hinh 4.1)
Nu t ban u c trong d li u t in thi tr v kt qu tra t ngay l p tc,
ngc lai t c th sai do hai trng hp do phi qua m t bc kim tra x ly
ngn ng t nhin. Trng hp 1: t chp c khng chinh xc do kt qu nh n
dang sai, gii quyt trng hp ny, ng dng s dng thu t ton tim t gn
ung li t k danh sch cc t lin quan n t sai va chp. Trng hp 2: t
chp do dang bin th ca t vng, do c thm cc tin t, h u t nn trong t in
khng tn tai d li u, gii quyt trng hp ny ng dng dung thu t ton khi
phc t gc tr v nguyn mu. Trng hp ngc lai t hon ton khng c
trong t in thi ng dng thng bo khng c kt qu thi ng dng tim cc t
tng t gi y cho ngi dung chn la.

35

Hnh 4.17 S thu t toan tra t in va x l ngn ng t nhin


Khi xy dng ng dng t in trn i n thoai thi c hai iu kh khn cn
quan tm l tc x ly v b nh. Hai vn ny rt quan trng trong mi quan
h gia thit bi i ng v ng dng. Nu mun tc x ly nhanh thi tn b nh
v ngc lai ng dng cn nhiu b nh thi nh hng tc x ly. Mi trng

36

trn di ng thng gii han c v b nh ln tc x ly. Do ta phi gii


quyt hai vn ny cho thoa mn yu cu ng dng.
-Vn b nh: gii quyt vn ny phi tng dung lng b nh ca
th nh trn thit bi di ng. Hi n tai cc dung lng th nh khng cn l vn
kh khn nn vi c ny c gii quyt.
-Tc x ly: b vi x ly ca thit bi di ng thi kh c th nng cp c
do chung ta phi t chc cu truc d li u t in tng tc tra t nhanh
hn.
Nh v y ng dng khng nhng gii quyt cc vn v x ly cc ngn ng
t nhin m cn t chc cu truc d li u t in h tr tim kim nhanh.

4.2. Khi phuc t gc (Stemming)


Ting Anh l ngn ng thu c loai hinh ngn ng ha kt (flexional). Cc hinh
vi trong ngn ng ha kt thng khng ng m t minh m i kem ph t, mi ph
t c th mang ng thi nhiu y nghia, ho c ngc lai m t y nghia c th biu
din bng nhiu ph t. Trong ting Anh cc ph t c th tao ra cc dn xut ho c
bin cach khc nhau.
M t t trong vn bn ting Anh c th c nhiu th hi n khc nhau di
nhiu dang ng php khc nhau, tuy nhin chung cung mang m t n i dung ng
nghia. Nn chung c xem xet l m t. Vi d: look, looks, looking, looked, Cc
t dang ny thng l danh t s nhiu, ng t ngi th ba s it, ng t dang
thm -ing ho c dang qu kh, qu kh phn t. Do ng dng phi khi phc t
gc cc dang ny. T gc l m t phn ca t sau khi loai bo cc ph t. Ph t c
th l tin t ho c h u t. Vi d cc tin t nh: dis-, un-, muti- cc h u t nh:
-ly, -ment, -tion, -logy Vi mi ph t khc nhau s tao ra dn xut ho c bin
cch khc nhau v c cch x ly c th cho tng trng hp.
i vi tin t tao ra dn xut ca t, thi t s mang ng nghia khc, do
chung ta khng cn phi thc hi n khi phc t gc. Vi d: like v unlike l khc
nhau.

37

i vi h u t c hai trng hp: tao ra dn xut ho c tao ra bin cch. H u


t tao ra dn xut s c ng nghia khc nhau, ho c t loai khc nhau. Vi d: apply,
appliance, applicability, applicably, applicant, application, Trng hp ny s
khng dung khi phc t gc. H u t tao ra bin cch thi s tin hnh a v t
gc. Vi d books, booked s a v nguyn mu l book.
Tm lai chung ta chi s dng khi phc t gc trong trng hp h u t tao ra
bin cch, vi chung c cung ng nghia. Trong trng hp ny ng dng s dng
thu t ton khi phc t gc Porter [4] khi phc t gc.
Thu t ton Stemming Porter [4] do Martin Poter a ra nm 1980 sau ny
c tip tc pht trin v s dng r ng rai. Thu t ton ny c th gii quyt tt c
cc trng hp a v dang t gc nguyn mu. Trong pham vi ng dng ny
chi s dng thu t ton cho cc trng hp sau:
Danh t dang s nhiu, bo -s ho c es a v nguyn mu.
ng t chia ngi th ba s it bo s ho c es a v nguyn mu.
Nhng t thm ing ho c ed c a v nguyn mu.
Chuyn i thnh y trong trng hp gc t c nguyn m. Vi d:
companies compani company.
S hinh 4.2 minh ha v thu t ton Stemming c s dng trong ng
dng.

38

Hnh 4.18 S thu t toan khi phuc t gc


thu n ti n tuy theo mc ich ca ngi s dng, trong ng dng cho phep
ngi dung tuy chinh thit l p cu hinh: khng s dng stemming, s dng

39

stemming cho cc trng hp trn (m c inh a s dng), s dng stemming khi


phc t n gc.

4.3. Tim t gn ung


Kt qu nhn din t ca b Tesseract tuy kh cao nhng vn c mt s t
nhn din bi sai do ph thuc vo cht lng nh chp t vn bn. Luc ny ngi
dung phi chp lai phn vn bn hoc trc tip chinh sa kt qu nhn dang. Chinh
vi th vic p dng bi ton tim t gn ung vo chng trinh nhm lm tng tinh
tin dng cho ngi dung v lm khc phc mt phn qu trinh nhn din t khng
chinh xc ca b Tesseract. Sau y l cc phng php c th p dng vo bi ton
tim t gn ung trong lun vn.
4.3.1. Khong cch Levenstein
Trong khoa hc my tinh, khong cch Levenstein [7] l mt ai lng dung
o lng s khc nhau gia 2 chui: chui ngun s v chui ich t. Khong cch
Leveinstein gia 2 chui ny c tinh bng s ln bin i tun t t chui s thnh
chui t. C 3 phep bin i t chui s sang chui t l: thm, xa v thay th tng ky
t trong chui s.
Vi d: khong cch Levenstein gia 2 chui kitten v sitting l 3 vi phi thc
hin tun t 3 phep bin i t chui kitten sang sitting:
Kitten sitten (thay th k bng s)
Sitten sittin (thay th e bng i)
Sittin sitting (thm g vo cui chui)
Sau y l ma gi minh ha thut ton tim khong cch Levenstein gia 2
chui s vi chiu di chui l m v chui t vi chiu di chui l n:
Int LevensteinDistance ( char s[1...m], char t[1n])
{
//khi to mng 2 chiu d v kt qu D[i, j] s l khong cch
//Leveinstein gia 2 chui s v t. Vi i, j ln lt l k t u tin
//ca chui s v t. V mng D s cha (m+1)(n+1) gi tri.

40

D[i,j] :=0 //Khi tao cc gi tri trong mng =0


Lp t i=1 n m
D[i,0] :=i
Lp t j=1 n n
D[0,j] :=j
Lp t j=1 n n
Lp t i=1 n m
{
If s[i] = t[j]
D[i, j] :=D[i-1, j-1]
Else
D[i, j] := minimum (
D[i-1,j] +1 //xa k t
D[i,j-1] //thm k t
D[i,j] //thay th 1 k t

}
Return D[m, n]
}
Bng 4.3 Minh ha ma trn kt qu sau khi tinh khong cach Levenstein

0
s 1
i 2
t 3
t 4
i 5
n 6
g 7

k
1
1
2
3
4
5
6
7

i
2
2
1
2
3
4
5
6

41

t
3
3
2
1
2
3
4
5

t
4
4
3
2
1
2
3
4

e
5
5
4
3
2
2
3
4

n
6
6
5
4
3
3
2
3

phc tap ca thut ton tim khong cch Leveinstein gia 2 chui l
O(m*n) vi m, n l di ln lt ca 2 chui. p dng thut ton trn vo
trong bi ton tim t gn ung ta lm nh sau:
Gi s ta c kt qu nhn din t l chui s vi di xc inh.
Ta so snh lun lt chui s vi cc t a c trong t in, ly cc t
c cung di vi n v a vo mng chui kt qu.
Sau , ta tinh khong cch Leveinstein ca chui s vi tng t c
trong mng kt qu. Ly t no c khong cch Levenstein l 1 thi cho
vo danh sch t gn ung.
Hin thi cc t c trong danh sch t gn ung.
Trn l phng php s dng khong cch Leveinstein tim t gn ung
trong t in.
u im: Phng php ny a gii quyt c bi ton tim t gn ung v lit
k ra cc t c trong danh sch.
Nhc im: Tc thc thi chng trinh s chm vi phi dung vng lp
duyt qua tt cc t c trong t in m s lng t thi rt ln v sau mi p
dng thut ton tinh khong cch Leveinstein.
4.3.2. Thay th cc k t gn ng
Phng php p dng khong cch Leveinstein c dung nhiu tinh s
khc bit gia hai chui ky t. Tuy nhin p dng phng php ny vo trong
chng trinh tim t gn ung thi a mc phi nhc im nu trn l tc
thc thi cho vic tim t s rt chm chinh vi th m trong lun n nhm chung em
s dng thut ton n gin hn tim t gn ung trong t in da vo y tng
ln lt thay th tng ky t trong bng ch ci Latinh gm 26 ky t t a-z. Mc
inh thut ton chi thay th ln lt m t ky t t u cho n cui chui. Sau y l
m t chi tit gii thut. Thut ton nhn u vo l m t chui v kt qu tr v l
danh sch cc t gn ung chui u vo:
Khi tao mng danh sch cc ky t bao gm 26 ch ci thng t a-z (mc
inh thut ton chi lm vic trn cc ky t thng).

42

Thc hin vic lp tng ky t trong chui u vo. ng vi tng ky t trong


chui s thc hin thay th ln lt 26 ky t trong mng ky t ban u.
Vi mi kt qu thu c l m t t mi khi thay th ln lt tng ky t, ta
s kim tra t mi ny c trong d liu t in hay khng. Nu c thi ta s
a t mi ny vo trong mng danh sch t.
T mng danh sch kt qu l cc t gn ung, ta hin thi cc t ra cho
ngi dung chn.
u im: Thc hin ung chc nng l tim t v lit k ra danh sch cc t gn
ung vi kt qu nhn dang c trong d liu. Tc thc thi thut ton tng i
nhanh v c th p dng ci t trong chng trinh.
Nhc im: Mt s t chay khng n inh v ph thuc nhiu vo di ca t.
Nu t ngn tc tim t s rt nhanh v t di s mt nhiu thi gian thc hin
hn. Tuy nhin, nhin vo nh gi chung cho hai phng php trn, phng php
thay th ky t gn ung mang nhiu u im hn v c bit l kh nng ng dng
thc t trong chng trinh cho bi ton tim t gn ung vi tc thc thi cao hn
rt nhiu.

4.4. Cu truc d li u t in
Mi m t t trong t in u inh dang gm c t gc v nghia ca t (bao
gm c phin m v t loai). Mi t s c t gc v nghia vi kich thc lu tr
khc nhau. Bng 4.2 m t v cc trng d li u ny.
Bng 4.4 Bng m t cac trng d li u

STT

Tn trng d li u

Ghi ch

T gc

L t kha trong d li u t in,


di cc t c th khc nhau nn n c
kich thc bin ng.

N i dung

Bao gm phin m, t loai v cc nghia


ca t. N c kich thc bin ng

43

Vi cc trng d li u nh v y, ta phi t chc cu truc d li u lu tr


chung sao cho d dng truy xut. Ta c m t s gii php t chc mc t nh sau:
T chc cc mc t c cung kich thc c inh.
T chc cc mc t c kich thc bin ng.
Ta s xem xet tng phng php c th sau y.
4.4.1. T chc cc mc t co cung kich thc c inh.
t chc cc mc t c kich thc c inh thi chung ta phi bit kich thc
ln nht ca mc t c th c c, lu tr bao qut ht d li u.
u im: nhanh chng, d dng truy xut d li u ca t khi bit vi tri bt u
ca n.
Khuyt im: lang phi b nh vi c nhng t kich thc rt nho nhng dung
trng d li u ln gy lang phi. iu ny rt quang trng trn thit bi di ng han
ch v b nh.
Vi c qui inh kich thc c inh ca mc t han ch vi c lu tr cc mc t
thng dng c kich thc t gc ln hn ho c n i dung ln hn. Lm cho vi c lu
tr c v khng t nhin. Kh c p nh p thm mc t trong qu trinh pht sinh
d li u t in mi.
4.4.2. T chc cc mc t co kich thc bin ng.
Tuy theo mi mc t c kich thc bao nhiu ta s cp cho n m t b nh
lu tr d li u ca chung. Nh v y mi mc t s c thng tin v vi tri bt u v
kich thc trng d li u i kem theo.
u im: tit ki m ti a c ti nguyn b nh. Khng han ch kich thc
ca mc t. Mc t c th c d li u ln nho tuy y. Khng nh hng n cc mc
t khc nn c v t nhin hn.
Khuyt im: do mc t c kich thc khc nhau nn cn thm trng d li u
qun ly kich thc, kh khn cho vi c tra cu do cn t chc d li u h
tr tra cu nhanh hn.

44

Nh v y trn mi trng i ng thi vn b nh phi c u tin trc


do t chc mc t c cung kich thc c inh b c l nhiu khuyt im hn.
Trong khi t chc cc mc t c kich thc bin n g s trnh lang phi khng
gian b nh nn gii php ny c s dng trong ng dng. V gii quyt vn
tc trong vi c truy xut n ta phi t chc d li u t in cho hp ly.
4.4.3. T chc d liu t in tra cu nhanh
Vn t chc mc t d li u ng a c gii quyt. By gi chung ta gii
quyt vn t chc t p tin h tr tim kim. Vi c t chc cu truc t p tin h tr
tim kim nhanh c nhiu phng php khc nhau. Sau y l m t s phng php
t chc t p tin:
T p tin tun t: l t p tin lu tr cc mc t lin tip nhau. Khi tim
kim t kha thi thc hi n bng cch em t kha so snh tim kim
trong cc t gc ca t p tin. Nh v y trng hp nhanh nht l t kha
khp ngay ln tim kim u tin trong t p tin, v xu nht l n phi
duy t tt c cc t gc gn ht t p tin mi tim ra. Phng php ny d
ci t tuy nhin s tn thi gian x ly do ngun d li u rt nhiu t dn
n vi c truy xut ch m hn c bi t trn thit bi di ng.
T p tin chi mc: lm tng hi u qu tim kim i vi t p tin c
kich thc ln ngi ta s dng t p tin chi mc. T p tin chi mc gm c
t kha v cc thng tin miu t vi tri ca d li u trong t p tin ng
nghia. Nh v y vi c tim kim t kha tr nn d dng hn khi chi cn
tim kim trn m t t p tin ton t kha v chi cn da trn thng tin vi tri
kem theo ta ly c ng nghia. Ta c th s dng tim kim nhi phn
tng tc tim kim trn t p tin chi mc ny.
T p tin bm: thay vi tim kim trn ton b cc t kha, ta s dng
t p tin bm phn loai cc t kha c cung m t tinh cht no vo
cung m t cm, gii han pham vi tim kim khi s lng mu tin ln.
Cy nhi phn tim kim: ta c th c tt c cc mu tin chi mc ri
pht sinh ra cy tim kim, ri lu cy ln t p tin. Vi c s dng cy

45

nhi phn tim kim c u im l khai thc c tc tim kim, tuy


nhin cung c nhc im l phi s dng nhiu b nh lu tr cy nhi
phn.
Qua vi c xem xet cc phng php t chc t p tin trn thi xet theo m t tc
v b nh, ta nhn thy vi c kt hp phng php chi mc v bm t p tin l
thich hp nht cho vi c tim kim nhanh. Sau y ta s i vo c th cch s dng
phng php ny.

4.4.3.1. T chc t p tin ch muc kt hp bm t p tin


Mc ich vi c bm t p tin chi mc nhm chia nho pham vi tim kim, h tr
tim kim nhanh. Nn n phi thoa man cc tiu chi sau:
Vi c tinh ton hm bm phi nhanh.
Cc t kha phn b u trong bng bm.
Vn gi th t nh d li u t in ban u.
T p tin chi mc a c t chc sp xp tng dn theo bng ch ci nn ta s
bm theo cc ch ci u ca t. Nh v y hm bm a p ng c hai tiu chi.
thoa man cc tiu chi cn lai ta c th bm t p tin chi mc theo m t hay nhiu
cp tng ng vi s ky t u. Vi c chia theo bng bm cng nhiu cp s lm
phn ha cng chi tit d li u. Do ta phi xem xet chn cp bng bm cho hp
ly.
Bng bm cp m t: Gom cc t c cung m t ky t u tin thnh
m t cm. Nh v y d li u s c phn thnh khong 30 cm, tuy
nhin s c cm c ngun d li u nhiu v cung c cm it d li u. S
lng d li u trong mi cm vn cn ln gy ra cm gic ng dng chay
ch m nn cn phi phn hoach nho hn na.
Bng bm cp hai: Cc t c cung hai ky t u s c gom thnh
m t cm, nn pham vi tim kim a gim ng k. Cc phn hoach s
nho hn nn thi gian tim kim trong khong phn hoach gim ng
k.

46

Bng bm cp ba: Cc t c cung ba ky t u s c gom thnh


m t cm. Cc t p tin s c chia vn, kich thc b nh v thi gian
nap t s tng ln.
Nh v y vi c bm cp m t s gy ra cm gic ch m, cn nu bm cp ba
ho c nhiu hn thi c th chia qu vn t p tin v tng kich thc b nh m t cch
khng cn thit. Trong cc thit bi di ng hi n nay tc x ly cung tng ng k
nn vi c s dng bm cp hai c th trung ha gia thi gian nap v kich thc b
nh bi chim dng.

4.4.3.2. T chc t p tin ng nghia


Da vo chi mc ta c thng tin v im bt u v di ca phn nghia
trong t p tin ng nghia. Do , t p tin ng nghia chi bao gm cc trng d li u
sp xp tun t trong m t t p tin. Mi trng d li u bao gm phn phin m quc
t, t loai v cc nghia khc ca t.

4.4.3.3. Tra cu t in
Qua vi c phn tich cu truc t chc d li u t in nh trn ta c s d
li u t in (hinh 4.2) v da trn ta c thu t ton tra t thich hp trn cch t
chc d li u ny.

47

Hnh 4.19 S t chc tp tin t in


Thu t ton tra t trn s trn nh sau. Gi s ta mun tra t table.
Bc 1: c t p tin bm cp 1 ri dung tim kim nhi phn tra t
kha t trong t p tin bm cp 1. Da vo t kha t ta tim c thng
tin vi tri bt u (pos) v di (length) ca cm d li u t p tin bm
cp 2.
Bc 2: c t p tin bm cp 2 tai vi tri pos v chiu di length va
tim c ta ly ra c l m t khi d li u gm cc t kha bt u 2
ky t v c t ng u: ta tw. Tip tc tim kim nhi phn t
kha ta ca table trong khi d li u trn ta tim c thng tin vi tri
bt u (pos) v di (length) ca cm d li u t p tin chi mc.
Bc 3:c t p tin ch muc tai vi tri pos v chiu di length va tim
c ta ly ra c l m t khi d li u gm cc tt c t kha bt u 2
ky t ta. Tip tc tim kim nhi phn t kha table trong khi d li u

48

trn ta tim c thng tin vi tri bt u (pos) v di (length) ca cm


d li u t p tin ng nghia.
Bc 4: c t p tin ng nghia tai vi tri pos v chiu di length va
tim c ta ly ra c l m t khi d li u l phn ng nghia ca t
table. Tr kt qu v cho ng dng.
Nh v y vi bng bm hai cp chung ta c th nhanh chng tim ta chi mc
nh vo thu t ton tim kim nhi phn trn cc phn hoach con. Sau da vo chi
mc c th tim c n i dung ng nghia.

49

Chng 5 : CAI T VA THC NGHIM NG DUNG


Khi pht trin ng dng t in dung camera trn i n thoai di ng Android
ngoi cc vn v gii thu t, cu truc d li u cn c cc c trng ring v mn
hinh hin thi, x ly camera, lu tr cu hinh, m thanh Do trong l p trinh ng
dng cn phi gii quyt cc vn k thu t lin quan.

5.1. Ve khung va cac control trn man hinh camera.


Mn hinh camera l mn hinh chinh ca ng dng, n c thit k kh c
bi t gm c 2 lp. Lp th nht bn di l hin thi hinh nh camera. Lp th hai
bn trn l cc control gm khung gii han chp nh, cc button, input text, text
view. C hai vn xy ra cn gii quyt vi hai lp ny: hin thi v nh n s ki n
tng tc. Hinh nh 5.1 l giao di n mn hinh chinh camera.

Hnh 5.20 Giao di n


man hnh camera
i vi vn hin thi cho hai lp nm chng ln nhau, thi trong Android cho
phep chung ta thit k layout nh v y nh vo FrameLayout. Nh v y cc control
c chia thnh 2 nhm hin thi trn 2 lp ny. Nhm 1 l: SurfaceView dung
hin thi camera. Nhm 2 gm c khung gii han vung chp, cc button chp nh,

50

button lm ro net (auto focus), button tt m en fash, hai button zoom camera,
button tra t v edittext hin thi t c chp. Trn m t s dng thit bi i n thoai
khng c h tr auto focus, ho c khng h tr en flash thi cc button tng ng s
khng hin thi.
i vi vn nh n s ki n tng tc, do ng dng chung ta cn v m t
khung hinh gii han vung chp. Khung ny c kich thc thay i tuy y ca ngi
dung. Vi th cn phi c m t lp ha (lp Paint) bn trn h tr vi c v
khung. Lp Paint v khung s che ht ton b mn hinh nn chung ta s khng
tng tc trc tip c cc control bn di. Khi click vo cc control s khng
nh n c s ki n. gii quyt vn ny, chung em s dng gi l p s kin
click button, c nghia l bt s ki n theo ta , vi tri tng tc trn mn hinh, nu
vi tri nm trn control no thi control gi s ki n click. Nu nhp ngay khung
v keo th thi khung s thay i vi tri. Vi c x ly ny tuy gy kh khn nhng n
bo m tt c cc control u c tng tc du n nm xp ln nhau.
Khung ch nh t trn mn hinh chinh dung gii han vung chp c th thay
i kich thc cho phu hp vi kich thc ch thc t m bo chinh xc.
Trong ng dng lp RectView c tao ra dung qun ly cng vi c ny. v
hinh ch nh t trn mn hinh cn phi c m t lp ha l lpPaint, ta khi tao
v khai bo cc i tng cn thit nh sau:
private Paint paint = new Paint(Paint.ANTI_ALIAS_FLAG);
private static float top;
private static float left;
private static float right;
private static float bottom;

Vi Paint l lp cha cc thng tin v kiu dng v mu sc, cung cp cc


phng thc dung v cc hinh hc, v ch v cc bitmap. Cn cc tham s top,
left, right, bottom dung xc inh vi tri trn, tri, phi, di ca hinh ch nh t.
Hm dung khi tao cc tham s nh sau:
private void Init() {
// TODO Auto-generated method stub

51

left = (MAX_WIDTH / 2) - 100;


top = (MAX_HEIGHT / 2) - 40;
right = left + 120;
bottom = top + 60;
paint.setColor(Color.WHITE);
paint.setStrokeWidth(3);
paint.setStyle(Style.STROKE);
invalidate();
}

Sau chung ta s gi phng thc onDraw() v hinh ch nh t vi cc ta


trn.
@Override
protected void onDraw(Canvas canvas) {
// TODO Auto-generated method stub
canvas.drawRect(left, top, right, bottom, paint);
}

5.2. Thu nhn anh t camera in thoai.


u vo cho b nhn din ky t quang hc Tesseract l tp tin hinh nh dang
bitmap chinh vi th ta cn lp trinh x ly camera trn in thoai Android c th
thu nhn nh t vn bn giy.
Hu ht cc loai in thoai thng minh hin nay u c tich hp phn cng
camera trong thit bi. V camera tr thnh phn khng th thiu trong cc h iu
hnh cho di ng. Android khng phi l mt ngoai l, v trong th vin cc hm
API c h tr sn trong Android SDK thi Android a cung cp cho ta lp tin ich
c th truy xut v iu khin camera trn thit bi c h tr. c th thc hin
c iu ny, ta s dng lp Camera cung vi lp SurfaceView().
Lp Camera cung cp cc phng thc c th thay i cc thng s thit
lp trn camera, xem trc nh v c bit l ghi nhn hinh nh t ng kinh camera
ca in thoai. Trc khi c th s dng c camera trn thit bi ta cn phi thit

52

lp quyn s dng cc phn cng trong thit bi v trng hp ny l camera, cc


quyn c thit lp s c t trong tp tin AndroidManifest.xml:
<uses-permission android:name="android.permission.CAMERA" />

Sau ta tao ra lp SurfaceView hin thi hinh nh trc tip qua ng kinh
camera in thoai. Lp ny nm gi phn hin thi hinh nh v chiu trch nhi m v
lai trn din tich ca mn hinh. Ta s dng lp interface SurfaceHolder truy cp
phn b mt bn di lp SurfaceView. Ta thc thi lai cc hm trong interface
SurfaceHolder.Callback v thm cc hm callback ny trong SurfaceHolder. Cc
phng thc c thc thi lai trong interface SurfaceHolder.Callback bao gm 3
phng thc sau:
public void surfaceChanged(SurfaceHolder arg0, int arg1, int arg2, int
arg3) ;
public void surfaceCreated(SurfaceHolder arg0)
public void surfaceDestroyed(SurfaceHolder arg0)

Hm surfaceCreated c gi sau khi m mt surface a c tao ra, hm


surfaceChanged c gi sau khi c bt k s thay i no xy ra trn surface v
hm surfaceDestroyed c gi khi surface bi hy.
Sau khi a tao ra 1 surface th hin hinh nh trn mn hinh, ta bt u s
dng lp Camera, gi phng thc Camera.open() m ng kinh my nh trn
in thoai, sau ta thit lp xem nh trc tip trn b mt thng qua hm
setPreviewDisplay(). Cc hm ny c gi ln u trong hm callback
surfaceCreated() khi khi tao cc thng s ban u cho camera.
public void surfaceCreated(SurfaceHolder arg0) {
Camera mCamera = Camera.open();
mCamera.setPreviewDisplay(mySurface_holder);
}
Sau ta s thit lp cc thng s tuy chinh trong camera thng qua i
tng Camera.parameters ca lp Camera. Gi hm camera.getParameters()
ly cc thng s c thit lp hin tai trong my nh in thoai. Ta c th thit lp
lai cc thng s trong camera bng gi cc phng thc c dang set() trong i

53

tng Parameters v hon tt vic thay i gi tri cc thng s dung phng thc
setParameters. Ta cn thit lp lai cc thng s ca camera khi hm callback
surfaceChanged() c gi. Sau khi thit lp lai thng s cho camera, gi phng
thc startPreview() bt u ch xem trc hinh nh trc tip qua ng kinh
ca camera in thoai.
public void surfaceChanged(SurfaceHolder arg0, int arg1, int
arg2, int arg3) {
Camera.Parameters params = mCamera.getParameters();
.....
// Thc hin vic thay i cc thng s y
.....
// hon tt vic thay i cc thng s
mCamera.setParameters(params);
mCamera.startPreview();
}

Sau khi kt thuc qu trinh s dng camera ta gi phng thc


camera.realease() v dng pht khung nh xem trc stopPreview(). Cc phng
thc ny c s dng khi surfaceDestroyed() c gi:
public void surfaceDestroyed(SurfaceHolder arg0) {
// TODO Auto-generated method stub
mCamera.release();
mCamera.stopPreview();
}

Trong lp Camera c h tr cc lp interface callback n nhiu s kin khc


nhau trong ng dng. Sau y l cc interface h tr vic gi thng bo n cc s
kin trong lp Camera:
Camera.AutoFocusCallback
Camera.ErrorCallback
Camera.PictureCallback
Camera.PreviewCallback

54

Camera.ShutterCallback
Interface Camera.AutoFocusCallback c s dng gi thng bo khi qu
trinh ly net t ng (auto focus) hon tt. Tinh nng ly net t ng thng c
s dng trong camera tng cht lng nh v khin nh chp ro vt th hn. Tuy
nhin khng phi thit bi no c camera cung h tr tinh nng ny. Trong interface
ny, hm onAutoFocus() l hm thun o v ta cn thc thi hm ny trong chng
trinh. Hm ny s c gi khi qu trinh ly net t ng hon tt. V bt u cho
camera thc hin vic ly net t ng ta s dng phng thc camera.autoFocus().
Interface Camera.ErrorCallback bo hiu khi c li xy ra. Phng thc
chinh l onError() s c gi khi c xy ra li trong vic thao tc vi camera.
Interface Camera.PictureCallback c s dng cung cp d liu nh sau
khi nh a c chp. Phng thc chinh onPictureTaken() c gi khi d liu a
sn sng. inh dang ca nh ph thuc vo inh dang nh ca camera v c th
c thit lp thng qua i tng Camera.Parameters.
Interface Camera.PreviewCallback c s dng khi cung cp d liu ca
khung hinh xem trc. Phng thc chinh l onPreviewFrame() c s dng khi
khung duyt trc nh a c d liu. inh dang ca d liu cung ph thuc vo
inh dang hin tai ca camera.
Cui cung l interface Camera.ShutterCallback thng bo khi nh a c
chp xong t camera in thoai. Phng thc chinh l onShutter().

5.3. Hin thi ting Vi t va inh dang ch trn man hinh.


Ni dung ng nghia ca t in bao gm t kha, t phin m quc t, t loai,
cc nghia ting Vit ca t, cc vi d s dng, cc t ng nghia hoc cc t lin
kt, do ni dung phi lm sao cho ngi dung d nhin v d hiu nht, vi nhiu
phn trong ng nghia nh vy chung ta phi inh dang kiu ch, mu ch, c ch
cho phu hp.

55

5.3.1. Hin thi ting Vit trn Android


Vi s ra i ca font ch Unicode (m t ky t biu din bng 2 byte) c cc
ky t ting Vi t trong phn m r ng, ting Vi t a c hin thi tt nh cc ngn
ng khc trong nhng phn mm my tinh. i vi cc thit bi k thu t s c nhn
khc nh martphone, di n g cung s dng cung loai font. Do khi pht trin
t in trn cc thit bi ny thi vn l chi cn s dng font Unicode trn cc
control hin thi.
Trong h iu hnh Android hi n nay thi cc font Unicode a c s dng
lm font ch hin thi. Trong mi thit bi Android a h tr gia inh font Droid
chun sau Droid Sans, Droid Sans Mono v Droid Serif. M c inh binh thng khi
khng tuy chinh loai font no thi hin thi l font Droid Sans. (hinh minh ha).

Hnh 5.21 Minh ha gia nh font Droid


Vi font Droid c th hin thi c inh dang Unicode. Do hin thi
c y cc du ting Vit trong h iu hnh Android thi ni dung cn phi
inh dang theo chun Unicode c th l ng dng s dng ma UTF-8.
UTF-8 l mt cch ma ha c tc dng ging nh UTF-16 v UTF-32,
UTF-8 c th biu din tt c cc ch ci trong b ky t Unicode, nhng im khc
bit quan trng nht l UTF-8 c thit k tng thich vi chun ASCII. UTF-8
c th s dng t mt (cho nhng ky t trong ASCII) cho n 6 byte biu din
mt ky t.

56

Chinh vi tng thich vi ASCII, UTF-8 cc k c li th khi c s dng


b sung h tr Unicode cho cc phn mm. Thm vo , cc nh pht trin phn
mm vn c th s dng cc hm th vin c sn ca ngn ng lp trinh C so
snh (comparisons) v xp th t. Ngc lai, h tr cc cch ma ha 16 bit hay
32 bit nh trn, mt s ln phn mm buc phi vit lai do tn rt nhiu cng
sc. Mt im manh na ca UTF-8 l vi cc vn bn chi c mt s it cc ky t
ngoi ASCII, hay thm chi cho cc ngn ng dung bng ch ci Latinh nh ting
Vit, ting Php, ting Ty Ban Nha, v.v cch ma ha kiu ny cc k tit kim
khng gian lu tr. UTF-8 c thit k m bo khng c chui byte ca ky t
no lai nm trong mt chui ca ky t khc di hn. iu ny khin cho vic tim
kim ky t theo byte trong mt vn bn l rt d dng.
By gi vic hin thi cc font ch ln cc control ta chi cn gi hm
SetText(myText) v ting Vi t s hin thi binh thng.
5.3.2. inh dng ng nghia t in.
Nh a gii thi u nhiu phn trong ng nghia ca t in bao gm t kha, t
phin m quc t, t loai, cc nghia ting Vit ca t, cc vi d s dng, cc t ng
nghia nh vy chung ta phi inh dang kiu ch, mu ch, c ch cho d phn bi t
cc thnh phn trn.

57

Trong Android cc TextView khng d


dng thay i phong cch inh dang ca
m t chui vn bn, ging nh khng c h
tr sn thc hi n chc nng sau:
textView.setTextColor(Color.RED,
10, 20); nhm thit l p vn bn t ky t

th 10 n ky t th 20 c mu o, m phi
c cc phng php khc inh dang ch
hin thi khng chi vi mu sc m cn vi
tt c cc kiu dng khc.

Hnh 5.22 Hnh nh dng vn bn


hin th theo cac kiu phong cach
Phng php th nht: s dng th HTML inh dang chui. Khi
n i dung t in phi c inh dang sn cc thu c tinh ca th, vi c
lm ny tng t nh inh dang th trong web. Chung ta c th s dng
cc th HTML <b>, <u>, <i>, inh dang cc kiu ch m, gach
chn, ch nghing, Tuy nhin phng php ny s gy kh khn cho
ngun d li u ln m cha c thit k sn cc th tag. Vi d nh sau:
khi cc ban c cc n i dung c inh dang nh sau:
<string name="text1">This text uses <b>bold</b> and
<i>italics</i> by using inline tags such as <b> within the
string file.</string>

Thi n i dung ny s c hin thi nh sau:


This text uses bold and italics by using inline tags such as
<b> within the string file.
Phng php th hai dung CharSequence. Trong Android, Textview
dung inh dang v hin thi vn bn khng chi vi kiu d li u lp

58

String n gin m cn c th s dng lp CharSequence. M t


CharSequence l m t lp i tng, n tru tng hn String, String l
m t sub-class ca CharSequence. M t CharSequence l m t day cc ky
t ging nh String nhng n c th inh dang m t loat cc ky t bn
trong, nh l m t SpannableString. Nhng gi chung ta cn lm l thay
i day ky t gia TextView thm vo m t khong inh dang trong
chui . Ni m t cch chinh xc hn l chung ta s thm
CharacterStyles vo CharSequence ca TextView, ci m c gi l
SpannableString.
So snh hai phng php trn ta thy phng php th hai n gin hn vi
khng cn phi inh dang th HTML phc tap, v do trong d li u a c cc ky
hi u nh du phn bi t cc thnh phn ng nghia, nn chi cn xc inh vi tri chui
cn inh dang l c th inh dang chinh xc theo y mun. Do trong ng dng
ny, em a chn phng php dung CharSequences inh dang.
y l m t phng php inh dang vn bn m t cch ng vi chi cn ta xc
inh vi tri bt u v kt thuc trong cn inh dang trong chui. Vi c xc inh da
vo ky hi u kha nh du. Vi d m t TextView c n i dung nh sau: Hello
#World#!. By gi chung ta mun ch World c mu o thi phi tim vi tri ca hai
du # sau da vo Class SpannableStringBuilder gi hm setSpan() inh
dang oan text .
public CharSequence FormatText() {
CharSequence myText = "Hello #World#!";
int start = myText.toString().indexOf("#");
int end = myText.toString().indexOf("#");
// Copy the spannable string to a mutable spannable string
SpannableStringBuilder ssb = new
SpannableStringBuilder(myText);
ssb.setSpan(new

ForegroundColorSpan(Color.RED),

end, Spannable.SPAN_EXCLUSIVE_EXCLUSIVE);
// Delete the tokens before and after the span

59

start,

ssb.delete(start, start + 1);


ssb.delete(end, end + 1);
myText = ssb;
return myText;
}

Trong vi d trn ForegroundColorSpan(color) kiu inh dang t mu nn.


Ngoi

ra

cn

cc

kiu

in

(TypeFace.BOLD),

in

nghing

(TypeFace.ITALIC), ho c va m va nghing (TypeFace.BOLD_ITALIC), kiu


gach chn (UnderlineSpan), v kiu ng dn lin kt (ClickableSpan).
Kiu ng dn lin kt (ClickableSpan) dung lin kt cc t c lin quan
vi nhau (t ng nghia, t tri nghia, t vit tt ) trong phn n i dung ng
nghia. Khi ta click vo t thi n s tip tc hin thi nghia. Vi d: pub l vit tt
ca public house. Khi tra t pub s c ng dn lin kt n nghia t public
house.

Hnh 5.23 Hnh nh dng lin kt


Mun s dng ClickSpan ta phi khai bo ng ky s dng.
MovementMethod m = myText.getMovementMethod();
if ((m == null) || !(m instanceof LinkMovementMethod)) {
myText.setMovementMethod(LinkMovementMethod.getInstance());
}

Sau ta chn oan text cn link kt v bt s ki n click vo text:

60

final String textClicked = myText.substring(start, end);


ssb.setSpan(new ClickableSpan() {
public void onClick(View widget) {
// TODO Auto-generated method stub
MeanWordActivity.restartActivity(textClicked, widget);
}
}
, start, end, Spannable.SPAN_EXCLUSIVE_EXCLUSIVE);

5.4. Ma hoa d li u t in.


Hi n nay vn bo m t, ma ha thng tin ngy cng quan trng v rt cn
thit, n c nhiu ng dng r ng rai trong cu c sng, ph bin trong cc linh vc
khc nhau. D li u trong ng dng t in cung rt quan trng, n phi c ma
ha trnh ngi dung sao chep d li u vi pham bn quyn d li u. c bi t,
khi nhng k xu li dng d li u thay i n i dung d li u bn trong t in s
gy ra nhng h u qu nghim trng, lm l ch lac thng tin cho ngi s dng. Do
, d li u trong t in phi c ma ha tng an ton, bo m t thng tin.
Trong thc t phng php ma ha DES [5] (Data Encryption Standard)
c s dng r ng ri nht. N thng c dung ma ha dng d li u trn
mang v d li u lu tr trn ia. DES c FIPS (Tiu chun X ly Thng tin Lin
bang Hoa K) chn lm chun chinh thc vo nm 1976. Sau chun ny c s
dng rng rai trn pham vi th gii.
DES l thut ton ma ha khi: n x ly tng khi thng tin ca bn ro c
di xc inh v bin i theo nhng qu trinh phc tap tr thnh khi thng tin
ca bn ma c di khng thay i. Trong trng hp ca DES, di mi khi
l 64 bit. DES cung s dng kha c bit ha qu trinh chuyn i. Nh vy, chi
khi bit kha mi c th gii ma c vn bn ma. Kha dung trong DES c di
ton b l 64 bit. Tuy nhin chi c 56 bit thc s c s dng; 8 bit cn lai chi
dung cho vic kim tra. Vi th, di thc t ca kha chi l 56 bit. thi im
DES ra i ngi ta a tinh ton rng vic ph c kho ma DES l rt kh khn,
n i hoi chi phi hng chc triu USD v tiu tn khong thi gian rt nhiu nm.

61

Cung vi s pht trin ca cc loai my tinh v mang my tinh c tc tinh ton


rt cao, kho ma DES c th bi ph trong khong thi gian ngy cng ngn vi chi
phi ngy cng thp. Du vy vic ny vn vt xa kh nng ca cc hacker thng
thng v ma ho DES vn tip tc tn tai trong nhiu linh vc nh ngn hng,
thng mai, thng tin...
Trong ng dng d li u a c s dng thu t ton ma ha DES, ma ha
t p tin ng nghia. Mi m t t kha trong t p tin chi mc xc inh m t day byte
n i dung nghia ca t kha trong t p tin ng nghia. Tuy nhin n i dung a
c ma ha theo thu t ton DES, vi key nm trong 8 byte cui ca day n i dung.
Do ng dng mun hin thi ung n i dung ng nghia phi tch key ra v
dung key gii ma ngc cc n i dung cn lai, ta c n i dung th t s.
Trong Java c cc th vi n java.security v javax.crypto h tr tao key v
ci t sn cc gii thu t ma ha d li u theo chun. ng dng s dng cc th
vi n ny ma ha theo gii thu t DES. Qu trinh tao key (dung i tng
KeyGenerator) v ma ha d li u (dung i tng Cipher) nh sau, kt qu tr v
l d li u a c ma ha v key kem theo [6]:
public byte []EnCrypt(byte []data){
key = null;
try {
KeyGenerator kg = KeyGenerator.getInstance("DES");
Cipher cipher = Cipher.getInstance("DES");
key = kg.generateKey();
cipher.init(Cipher.ENCRYPT_MODE, key);
return (cipher.doFinal(data));
} catch (Exception e) {
return null;
}
}

Qu trinh gii ma d li u da vo key nh sau, kt qu tr v s l d li u


c gii ma:
public byte []DeCrypt(byte []data, Key key){

62

try {
Cipher cipher = Cipher.getInstance("DES");
cipher.init(Cipher.DECRYPT_MODE, key);
return cipher.doFinal(data);
} catch (Exception e) {
return null;
}
}

Nh v y vi vi c ma ha d li u vi thu t ton DES giup m bo d li u


t in tr nn an ton hn. Trnh bi nh cp v thay i d li u.

5.5. Lu tr cu hinh chc nng ca ng dung.


Cc thng s k thu t ca ng dng c lu tr cc b trong thit bi di
ng. C nhiu cch lu tr thng tin trn di ng tuy thu c vo yu cu s dng,
c th lu thng tin trn t p
tin kiu text ho c xml ri ng dng c t p tin ,
cch ny gm c lu tr trn b nh thit bi di ng v lu tr trn b nh cc
thit bi lu tr bn ngoi (vi d nh th nh), ho c c th lu tr trn h c s d
li u SQLite hi n a c h tr cho Android, khi cn c d li u thi thc hi n cu
truy vn c s d li u ly thng tin, ho c c th lu tr trn web vi my ch
(server) l ca chinh minh. Cc cch trn chung ta u cn lu tr trn t p tin v
phi t thit k cc i tng v cc control giao di n tng ng. Thc ra trong
Android a h tr sn cho chung ta m t i tng giup cho chung ta c th d dng
lu tr cc thng tin cu hinh ca ng dng chi cn chung ta thao tc trc tip trn
cc control ai di n chinh l SharedPreferences.
Lp SharedPreferences l khun kh (framework) chung cho phep chung ta
ghi v c c p i d li u t kha gi tri (key-value). Vi SharedPreferences
chung ta c th lu tr nhiu kiu d li u nh: boolean, int, long, float v string.
Cc kiu d li u ny s keo di xuyn sut cc phin ca ngi s dng th m chi
ngay c khi ng dng a bi tt.
SharedPreferences h tr cc loai control giao di n (PreferenceScreen) cho
ngi dung d dng thc hi n cc chc nng tuy chinh lu tr thng tin tng ng

63

vi cc kiu d li u. C nhiu loai control trong PreferenceScreen tng t cc


control c bn trong View ca Android nh: EditTextPreference dung lu tr cc
chui text, CheckBoxPreference dung lu tr gi tri logic (boolean),
ListPreference dung lu tr gi tri a chn trong danh sch la chn, v.v Mi
i tng control giao di n trn u c m t tn key tng ng lu tr gi tri.
Khi chung ta thao tc trn cc control ny thi d li u s t ng lu tr xung thit
bi da theo key.
<PreferenceScreen
xmlns:android="http://schemas.android.com/apk/res/android">
<CheckBoxPreference
android:defaultValue="true"
android:key="spelling"
android:summary="Chc nng tim t gn ung"
android:title="Spelling suggestion" />
<ListPreference
android:defaultValue="2"
android:entries="@array/stemmer_display"
android:entryValues="@array/stemmer_value"
android:key="stemmer"
android:summary="Chc nng khi phuc t gc"
android:title="Stemming" />
<ListPreference
android:defaultValue="18"
android:entries="@array/font_size_display"
android:entryValues="@array/font_size_value"
android:key="fontsize"
android:summary="Chinh c ch cua hin thi t in"
android:title="Font size of dictionary" />
</PreferenceScreen>

64

Hnh 5.24 ScreenPreference

Hnh 5.25 ListPreference

Khi ng dng mun c cc gi tri thi trc ht ng dng phi ng ky s


dng SharedPreferences.
SharedPreferences prefs;
prefs=PreferenceManager.getDefaultSharedPreferences(
getApplicationContext());

Vi d by gi ng dng mun c gi tri ca CheckBoxPreference thi gi


getBoolean, v tng t cho cc kiu d li u khc.
if (prefs.getBoolean("spelling", true)) {
//do something
}

Trong ng dng t in camera ny em a s dng SharedPreferences lu


tr ba gi tri cu hinh sau: tt/ b t chc nng tim t gn ung, la chn tuy chinh
chc nng khi phc t gc v tuy chinh la chn kich thc c ch hin thi ng
nghia tra t.

65

5.6. Ky thu t phat m t ting Anh dung API trn Android


i vi cc ngn ng ha kt nh ting Anh. Do loai hinh ny c s lng t
rt ln (ting Anh khong 400.000 ting), hn na tuy theo cu truc ng php m
cung m t t c th c cc cch pht m khc nhau. Vi th ta khng th lm ging
cch pht m ting Vi t l ghi m c tng t, m ta phi s dng cch khc l
da vo nguyn tc c c khi nghin cu v t vng hc. y l cng vi c kt
hp gia ngn ng hc v khoa hc my tinh.
Trong ng dng ny thi ngn ng pht m l ting Anh, v hi n nay pht
m c t ting Anh cc t in trn Desktop hi n nay a s dng b SDK Text
To Speech ca Microsoft. Vi b SDK ny thi cng vi c l p trinh pht m tr nn
n gin hn rt nhiu. Hi n nay trn h iu hnh Android cung a h tr b
SDK ny v c gi l Text-To-Speech API (TTS).
Text To Speech (TTS) trn nn tng Android bt u vi Android 1.4 (API
Level4) a h tr cc ngn ng: ting Anh, ting Php, ting c, ting Y v ting
Ty Ban Nha. Ngoi ra n cn tuy thu c vo vung min, nh ting Anh ging M
v ting Anh ging bn x u c h tr. TTS chi cn bit cn c ngn ng no,
vi d Paris c th c khc nhau theo ting Anh v ting Php.
M c du tt c cc thit bi Android u c h tr TTS, vi m t s thit bi c
b lu tr han ch c th thiu m t s t p tin ti nguyn ngn ng. Nu ngi
dung mun s dng ti nguyn ny, TTS API cho phep ng dng truy vn cc ti
nguyn c sn v c th ti v ci t thm. Vi v y khi s dng chc nng pht m
ny, bc u tin l kim tra s hi n di n ca cc ti nguyn TTS vi mc ich
tng ng:
Intent checkIntent = new Intent();
checkIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);
startActivityForResult(checkIntent, DATA_CHECK_CODE);

Sau khi kim tra thnh cng s c nh du kt qu bng m t bin


CHECK_VOICE_DATA_PASS v cho bit ng dng a sn sng tao i tng

66

TextToSpeech. Nu khng thi s bo cho ng dng gi Google Play t ng ti v


ci t. Di y l vi d v nhng gi thc hi n:
private TextToSpeech tts;
protected void onActivityResult(int requestCode, int resultCode,
Intent data) {
if (requestCode == DATA_CHECK_CODE) {
if (resultCode ==
TextToSpeech.Engine.CHECK_VOICE_DATA_PASS)
tts = new TextToSpeech(this, this);
else {
Intent install = new Intent();
install.setAction(TextToSpeech.Engine.ACTION_INSTALL_TTS_DATA);
startActivity(install);
}
}
}

Trong ng dng ny cn c ngn ng l ting Anh do ta kim tra n c h


tr v sau ci t loai ngn ng nh vo hm setLanguage.
if (status == TextToSpeech.SUCCESS) {
if (tts.isLanguageAvailable(Locale.ENGLISH) ==
TextToSpeech.LANG_AVAILABLE)
tts.setLanguage(Locale.ENGLISH);
} else if (status == TextToSpeech.ERROR) {
Toast.makeText(this, "Error when using TTS",
Toast.LENGTH_LONG).show();
}

By gi TextToSpeech ca chung ta a c khi tao v cu hinh. Chung ta c


th bt u lm cho ng dng pht m m t chui text nh vo phng thc speak()
nh sau.
tts.speak(text, TextToSpeech.QUEUE_ADD, null);

Text-To-Speech l chc nng dich v c cung cp cho nhiu ng dng khc


trong cung thit bi, do khi x dng xong TTS v khng cn na thi chung ta nn

67

ngng kt ni v tr dich v bng cch gi phng thc shutdown() trong Activity


onDestroy().
protected void onDestroy() {
// Close the Text to Speech Library
if (tts != null) {
tts.stop();
tts.shutdown();
// Log.d(TAG, "TTS Destroyed");
}
super.onDestroy();
}

Nh v y ng dng t in trn Android c th d dng pht m trc tip c


t ting Anh nh vo Text To Speech API m chung ta khng cn phi lu tr d
li u nhiu.

5.7. Mi trng phat trin ng dung


ti ng dng tra t in trn camera bao gm ng dng ci t trn in
thoai Android (tp tin ci t .apk) v tp tin d liu i kem vi chng trinh
chep vo th nh.
Sau y l danh sch cc cng c v mi trng pht trin thc hin ti:
Cng c thit k, v m hinh, s khi: Microsoft Word, Microsoft Visio,
Paint, Photoshop.
Mi trng v cng c lp trinh:
Eclipse IDE for Java phin bn Indigo.
Android SDK ( ci t qua plug-in ADT trn Eclipse).
Android NDK r8.
Cc cng c v th vi n s dng:
B ci t v ma ngun ca th vin Tesseract phin bn 3.01 trn
Windows.
Ma ngun th vin x ly nh Leptonica 1.68 v Libjpeg.
Cng c tesseract-android-tools trn Eclipse.

68

Mi trng ci t v th nghim:
My o Android trn Android SDK vi phin bn Android 2.3.3 v
phin bn hm API 14.
in thoai Sony Ecricson Neo Mt15i chay phin bn Android 2.3.4.
Ngoi ra cn mt s chng trinh khc h tr cho ng dng nh: pht m
ting Anh dung b SDK Text To Speech trn Android ca Microsoft

5.8. Hng dn cai t va s dung


5.8.1. Ci t chng trnh
Chng trinh CameraDictionary hoat ng tt trn cc loai in thoai s
dng nn tng Android phin bn 2.3 tr ln. Sau y l cu hinh v yu cu ci t:
Phin bn Android Gingerbeard 2.3 tr ln.
in thoai c camera v h tr tinh nng t ng ly net (auto focus),
nu khng c tinh nng ly net ny chng trinh vn c th s dng
c vi hinh nh cht lng binh thng trong iu ki n mi trng
tt.
B nh chinh in thoai cn trng it nht 20 MB cho ci t chng
trinh.
Th nh cn trng 38 MB cha cc tp tin d liu t in v d
liu ca b nhn dang Tesseract.

69

Tp tin ci t chng trinh


CameraDictionary.apk v th mc cha
d liu CameraDictionary.
u tin chep tp tin ci t
chng trinh vo trong th nh hoc b
nh in thoai v bt u ci t. Sau
, chep tip th mc CameraDictionary
v trong th nh.

Hnh 5.26 Biu tng chng trnh


sau khi cai t hoan tt

5.8.2.Hng dn s dng
Khi ng chng trinh CameraDictionary ln v ta c giao din mn hinh
chinh nh sau:

Hnh 5.27 Man hnh chng trnh khi khi ng

70

Cch tra t bng camera ca chng trinh: S dng khung hinh ch nht
gia mn hinh gii han lai vung nhn din ca t. C th dung tay iu khin
keo th phng to hoc thu nho vung din tich ca khung cho va vi t trong vung
vn bn. Sau khi hiu chinh khung gii han hinh ch nht theo y mun, nhn biu
tng hinh camera c nh du s 3 nhn dang t trong khung gii han. Kt
qu nhn dang t s c hin thi thanh EditText bn trn mn hinh. tra t va
nhn dang xong, nhn biu tng s 1 k bn khung EditText hin thi mn hinh
nghia ca t.

Hnh 5.28 Man hnh hin th nghia cua t


Ngoi ra nghe pht m t ting anh trong mn hin thi nghia ta nhn biu
tng hinh ci loa bn gc phi mn hinh v quay v mn hinh chinh nhn
biu tng mui tn k bn.
Lu : c th s dng tinh nng pht m t ting Anh trong chng trinh,
in thoai cn phi c ci t h thng Text To Speech trn Android.
Khi ngi dung nhn biu tng camera tra t, mc inh tinh nng ly net
t ng ca camera (nu c h tr) s c thc hin bc hinh chp s c ro
net nht. Tuy nhin, chng trinh cung cung cp cho ngi dung tinh nng t canh
chinh ly net t ng khi ngi dung cn. S dng bng cch nhn biu tng s 2
trong hinh. Ngoi ra, khi ngi dung s dng tra t trong iu kin thiu nh

71

sng thi c th nhn biu tng s 4 bt/ tt en flash trong camera in thoai
(nu in thoai c h tr flash).
Bn gc tri ca mn hinh l tinh nng phng to/ thu nho nh trn mn hinh.
Nu vn bn cn nhn din khong cch qu xa hoc c ch nho c th s dng
tinh nng zoom ny tng chinh xc khi nhn din.
Khi ch nhn din khng chinh xc, ngi dung c th chp lai hinh hoc
chinh sa trc tip kt qu nhn din theo y thich.
Mc ich chinh ca chng trinh l s dng camera trong in thoai tra t
in, nhm gim bt thi gian v kh khn khi phi nhp t trc tip. Tuy nhin nu
ngi dung mun s dng tinh nng tra t in thng thng l go t v tra thi
chng trinh cung cung cp cho ngi dung thc hin iu ny. Trong phn menu
setting, chn biu tng u tin dictionary chuyn qua tinh nng tra t thng
thng.

Hnh 5.29 Man hnh vi h thng menu setting bn di


Nh trong hinh trn, nhn phim setting trn in thoai vo h thng cc
menu con:
Dictionary: Chuyn sang tra cu t in bng cch nhp t.
Setting: Bt tt cc tinh nng nng cao trong chng trinh.
About: Hin thi thng tin chi tit ca chng trinh.
Exit: Thot chng trinh.

72

Hnh 5.30 Man hnh tra t in theo cach thng thng

Hnh 5.31 Man hnh thit lp setting


Spelling suggestion: y l tinh nng tim kim cc t gn ung khi
kt qu nhn dang khng chinh xc.
Stemming: Tinh nng khi phc t gc trong ting Anh. Vic khi
phc t gc s c thc hin khi tra t.
Font size of dictionary: Thit lp kich thc cho font ch hin thi
trong chng trinh.

73

5.9. Kt qua th nghi m


5.9.1. Th nghim khi nhn dng k t
Vi d liu ting Anh tng i ln m vic th nghim chi c th chp bng
tay nn nhm chung em chi chn ra khong 100 t ting Anh thng dng chp
v nh gi kt qu. D liu c chia ra thnh 2 b c in trn giy trng, mi b
bao gm 100 t ting Anh. B th 1 bao gm 100 t ting Anh s dng font ch
Times New Roman kich thc font ch l 12. B th 2 cung bao gm 100 t ting
Anh s dng font ch Arial cung kich thc 12. C 2 b u s dng font ch
thng v in nghing.
Mc inh cc nh c chp th nghim trong iu kin nh sng tt v u
s dng tinh nng ly net t ng trong camera ca in thoai. Bng sau y minh
ha kt qu th nghim chng trinh trn i n thoai Sony Neo MT15i vi camera
8Mpx c h tr t ng ly net:
Bng 5.5 Kt qu th nghim b nhn dng trong chng trnh
Font ch

Kiu ch

Tng s t

S t sai T l li

Times New Roman

In Thng

100

9%

Times New Roman

In nghing

100

16

16%

Arial

In Thng

100

12

12%

Arial

In nghing

100

22

22%

Da vo kt qu trn ta c th thy kt qu gia 2 loai font Times New Roman


v Arial kiu ch in thng c mc li t gn nh nhau. Tuy nhin, khi so
snh mc li t trong cung loai font nhng khc kiu ch in thng v in
nghing thi u chnh lch nhau. Ch in thng c mc li t thp hn so vi
ch in nghing. Nhin chung mc nhn dang bi li ca chng trinh nm trong
gii han di 20% v nhn dang t chinh xc kh cao. Cc li ny c th khc
phc bng cch chp lai nhiu ln c kt qu chinh xc hn.
Cc kt qu nhn din ung thng c chp trong iu kin nh sng tt v
nh a c ly net hon chinh. Cc trng hp chp t sai thng l do t c

74

nhiu net ging k nhau d ln l n. Vi d common trong t ny c 2 t m nm


k nhau tao ra nhiu net dc, d bi nh n di n sai. Ho c cc t c 2 t l nm k
nhau. Ngoi ra cc t c th bi nhm ln vi cc con s ho c cc ky hi u c bi t .
Di y l minh ha mt s trng hp nhn din sai khi chp th nghim trn
my Sony Neo dung Android 2.3.4:
Bng 5.6 Mt s kt qu nhn din sai
nh chp

Kt qu nhn dang

Nhn xt: Cc t nhn din sai thng do ngi dung chp trong iu kin
khng tt v nh ly net bi nhe dn n nhiu ky t bi nhn din nhm. Trong
trng hp ny ngi dung chi cn canh chinh camera v chp lai nh c kt qu
nhn din ung. Da vo qu trinh chp th nghim thi ta thy l phn ln cc t bi
nhn din sai u ri vo trng hp cc ky t e c trong t. Ky t e d bi nhn
nhm thanh ky t . Mt s trng hp nhn nhm khc l ky t r bi chuyn
thnh I hoc 1, ky t n thnh II, ky t o thnh 0.

75

Trn y l mt s trng hp nh bi nhn din sai thng gp trong chng


trinh. Phn ln cc li lin quan n ky t e u kh sa, cc ky t cn lai c th
chp lai nh nhiu ln cho ra kt qu nhn dang ung.
5.9.2. Th nghim khi x l ngn ng
Lun vn cung tin hnh th nghim trn khi x ly ngn ng bao gm khi
phc t gc v tim t gn ung v sau y l mt s kt qu sau th nghim.
Khi phuc t gc: sentences sentence

Hnh 5.32 Kt qu trc khi tra t

Hnh 5.33 Man hnh hin th nghia cua t sau khi x l t gc

76

Nhn xt: Tinh nng khi phc t gc hoat ng ung vi yu cu ra v


trong phn ln cc trng hp t nhn din ri vo cc trng hp thm ing hoc
ed thi kt qu nhn din s c x ly tr v t nguyn mu trc khi thc hin
vic tra cu t in. Tinh nng khi phc t gc c tich hp sn khi tra t v l
tinh nng tuy chn. Tc x ly vic tra t khi c khi phc t gc tng i
nhanh v nm trong khong thi gian 1 giy.
Tm t gn ng: balow below

Hnh 5.34 Man hnh hin th danh sach t gn ng vi kt qu nhn dng


Nhn xt: tinh nng tim t gn ung hoat ng ung vi yu cu ra l lit
k ra cc danh sch t gn vi t nhn dang trong t in. Tuy nhin, cc trng
hp lit k ra cc t gn ung chi hoat ng hiu qu nu t nhn din chi c 1 ky
t bi sai, cc t c 2 hoc 3 ky t sai tr ln thi vic lit k danh sch t gn ung s
khng c chinh xc. Tc thc thi ca tinh nng tim t gn ung tng i
nhanh i vi cc t c it ky t, cn cc t cha nhiu ky t thi vic thc hin s
chim thi gian lu hn nhng trong khong thi gian chp nhn c (khong 3
giy). V y cung c th xem nh l 1 im han ch ca tinh nng tim t gn
ung.
5.9.3. anh gia kt qu
Lun vn tng hp lai cc kt qu thc nghim trn my tht Sony Neo
Mt15i v nh gi tc thc thi ca tng thnh phn trong chng trinh.

77

Bng 5.7 anh gia tc thc thi cua chng trnh


STT

Tinh nng

Tc thc thi

Khi ng ng dng

Khi ng ngay lp tc

Khi nhn din t OCR

Tng i nhanh. Trung binh trong


khong 3 giy

Tra t in v hin thi nghia Tra t nhanh, hin thi nghia chinh xc.
Tc thc thi trong khong 1 giy

Tim t gn ung

Tng i nhanh, tc ph thuc


vo di ca t. Trung binh trong
khong 4 giy

5.9.4. So sanh ng dung vi cac ng dung hi n


co trn th trng
i vi cc loai t in tra t bng phng php nh p tay thng thng cho
Android hin nay thy ph bin nht l Andict, sau l Wordmate tuy nhin 2 t
in ny cn nhiu han ch, tinh nng cn c phn n gin, Andict chi c d liu
pht m 20.000 t cn Wordmate khng c pht m, khi tra cu thi chi trn 1 t
in n, hin thi d liu trn mu nn en gy kh chiu v khng quen...
i vi ng dng loai tra t in dung camera trn thit bi Android trn thi
trng cung xut hi n 2 ng dng h tr tng t l mSPDict v CamDictionary.
i vi mSPDict thi tinh nng chinh l tra t in nh p tay vi d li u tng hp t
Andict cn chc nng dung camera nh n dang ch l chc nng ph v khi nh n
dang xong thi ng dng tra t qua Google Translate ch khng tra qua d li u cc
b . ng dng CamDictionary do m t cng ty chuyn v nh n dang vn bn t
hinh nh nn phn nh n dang rt tt tuy nhin v dich thu t thi ng dng
CamDictionary s dng t Internet v phn pht m cung x ly server t Internet
nn bt ti n v tn thi gian ch i. Sau y l so snh c th m t s tinh nng
ca ng dng ti lu n vn Camera Dictionary v ng dng trn thi trng
CamDictionary.
Bng 5.8 Bng so sanh ng dung vi Camera Dictionary

78

Tinh

nng

so Camera-Dictionary (lu n vn) CamDictionary (thi trng)

sanh
T p tin ci t

Gm

t p

tin

ci M t t p tin ci t

t CameraDictionary.apk v
th mc cha d li u t in
CameraDictionary
Dung lng

16.3 MB

18.56 MB

B nh s dng 396 KB

512 KB

khi chay chng


trinh
Khi ng ng 1-2s hin thi mn hinh camera

1-2s hin thi mn hinh dich

dng

oan vn sau ngi dung


chuyn

sang

mn

hinh

camera
Tc

nh n Trung binh 2-3s

B nh n dang x ly rt tt,

dang

nh

sang

nhanh, chinh xc trung binh

text

1s

Tinh ti n dng Gii han khung chp, nn bt Khng c gii han khung
khi chp t

ti n hn

chp m x ly ch ln c n
con tro nn ti n li hn khi
s dng

D li u t in

T in Anh-Vi t: 106376 t: Bn Free: d li u t Google


18,8MB

Translate.
Bn License: c h tr 60%
t in Oxford.

Tc tra t

<1s

>1s, trung binh 3-4s

Thi gian hin thi T in Anh-Vi t tra t love T in Anh-Vi t tra t


nghia

mn hinh hin thi 39 dng vi love mn hinh hin thi t


y cc phn nghia v cc vi Yu, trong thi gian 3s.
d, thi gian <1s

79

Pht m

Ting Anh dung Text To Speech Dung Internet i x ly lu


trn my cc b . <1s

80

>1s

TNG KT
MT S KT QU T C
Trong qu trinh thc hi n ti Tra t in Anh-Vi t qua camera trn i n
thoai di ng dung Android chung em thu c nhng kt qu sau:
Tim hiu c cc loai i n thoai di ng v k thu t l p trinh ng
dng trn thit bi di ng.
Tim hiu c h iu hnh Android v cc thit bi s dng h iu
hnh ny. Nn tng Android ca Google gi vi tri s 1, nh vo s lng
phin bn phong phu v nhn c s h tr t hu ht cc hang sn
xut phn cng ln.
Tim hiu c cch t chc d li u t in hi u qu trong mi
trng di n g c nhiu han ch v ti nguyn. V tim hiu phng
php ma ha, gii ma d li u t in.
Xy dng thnh cng ng dng Camera-Dictionary trn i n thoai di
ng dung Android 2.3 tr ln c h tr camera. Vi d li u phong phu,
tra t nhanh v chinh xc, hoat ng n inh. ng dng c h tr pht
m ting Anh giup ngi dung c th hc d dng. Camera-Dictionary a
phn no p ng nhu cu v gp phn mang lai phng thc tra t ti n
li cho ngi dung. ng dng Camera-Dictionay c cc tinh nng sau:
Bng 5.9 Cac tinh nng chinh trong chng trnh
STT

Tn tinh nng

Tra t in trc tip bng camera in thoai

B t tt en flash khi cn lm sng mi trng chp

T ng ly net nh.

Phng to, thu nho ng kinh camera

Tra t in bng cch nhp t bn phim

Hin thi danh sch cc t k tip trong t in

Khi phc t gc, a v nguyn mu cc trng hp thm


-s, -es, -ed, -ing

81

Tim t gn ung, hin thi danh sch cc t gi y lin quan khi


8

t cn tra khng c trong t in

Thay i font ch hin thi nghia ca t

10

c, pht m t ting Anh

HN CH
ng dng chi x ly nh tt hn khi camera i n thoai c h tr ly
net t ng (auto focus). Thit bi camera c phn gii cng cao thi
nh cng ro net v b nh n dang ky t quang hc lm vi c cng hi u
qu.
Hi u qu chp nh tra t cn ph thu c vo nh sng mi trng v
iu ki n mu nn, phng ch phu hp. Khi ng dng chp mi trng
ch trn cc v t dng bn ngoi thi kich thc ch to giup nh n dang
chinh xc hn.
Thu t ton tim t gn ung cn han ch do chi x ly trng hp sai
m t ky t, cha x ly cc trng hp sai cung luc nhiu ky t. Do tc
x ly ca my cc b cn han ch.

HNG PHT TRIN


B sung hon chinh b d li u t in, nht l t in chuyn mn.
B sung b nh n dang ky t quang hc v d li u t in, nht l i
vi cc ngn ng khng s dng mu t Latinh nh Hoa, Nh t, Hn,
Thi, Vi c ny mang y nghia quan trng vi cc ngn ng ny gy kh
khn cho t in thng thng khng th nh p t cn tra t bn phim
thit bi di ng cho ngi mi s dng.
Ci thi n x ly nh nh chp c ro net trong nhiu iu ki n
nh sng khc nhau v trong nhiu ngun chp khc nhau. Vi c ny h
tr ngi dung c th chp tra t mi ni, nht l i vi du khch du
lich c th chp tra t cc bng hi u , chi dn

82

Ci thi n khi nh n dang quang hc nh n dang nhiu ky t hn


khng cn gii han vung chp v ng dng h tr dich cm t, dich cu
bng camera i n thoai.
Xy dng thm tinh nng tra t trc tuyn dung Google Translate
thng qua Internet (GPRS, 3G) khi ng dng tra cc t khng tim thy
d liu trn my cc b .

83

TAI LIU THAM KHO


Tai liu vit:
[1] Bui Tn Lc, Cao Thi Phng Thanh, Nghin cu v xy dng ng dng t
in trn in thoi di ng, Lun vn c nhn tin hc, ai hc Khoa hc T
nhin TP.HCM, 2004.
[2] Nguyn Hong Giang, Xy dng ng dng tra t in bng camera trn
in thoi di ng, Lun vn thac si tin hc, ai hc Khoa hc T Nhin
TP.HCM, 2011.
[3] Marko Gargenta, Learning Android, OREILLY, 2011.
[4] Jurij Smakov, JNI Examples for Android, 2009.
[5] Ray Smith, An overview of Tesseract OCR engine, 2007.
[6] Ray Smith, Daria Antonova, Dar Shyang-Lee, Adapting the Tesseract Open
Source OCR Engine for Multilingual OCR, ACM, 2009.
Website:
[1] http://developer.android.com/index.html
[2] http://tesseract-ocr.googlecode.com/files/TesseractOSCON.pdf
[3] http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
[4] http://tartarus.org/~martin/PorterStemmer/
[5] http://en.wikipedia.org/wiki/Data_Encryption_Standard
[6] http://www.java2s.com/Code/Java/Security/DES.htm
[7] http://en.wikipedia.org/wiki/Levenshtein_distance

84

You might also like