Professional Documents
Culture Documents
Monica Vladoiu
DLs focused collection of digital objects, along with methods for access and retrieval, for selection and organization, and for maintenance of the collection digital objects include text, 2D- or 3D-graphics, animation, audio, video, simulations, dynamic visualisations, and virtual reality worlds the definition accords equal weight to user (access and retrieval) and to librarian (organization and selection, and maintenance)
2/45
Issues to be addressed
what form are the documents in?
4/45
5/45
see: http://nzdl.sadl.uleth.ca/cgi-bin/library?e=d-00000-00---off-0hdl--00-0--0-10-0---0---
6/45
8/45
9/45
see: http://www.gutenberg.org/wiki/Main_Page
10/45
12/45
13/45
Music
digital collection of music capture popular imagination in ways that scholarly libraries will never do the music representation is made by an OMR program, which is similar to OCR, and works with printed music (scanned page of music book) lyrics should be made available too MIDI (musical instrument digital interface) is the standard used by the electronic music industry a music DL needs 2 major capabilities: to convert between different formats and to locate the relevant information The New Zealand Melody Index collection
see: http://www.nzdl.org - Melody Index
14/45
15/45
16/45
17/45
18/45
19/45
SEARCHING (1)
electronic document delivery is the first raison detre for most digital libraries conventional automated library searches are restricted to metadata DLs have access to the entire contents of the object they contain this is a great advantage
20/45
Searching (2)
in DLs, especially those for non-scholar users, search should satisfy the usual user needs more advanced search should be also possible as Alan Kay, a leading early proponent of the visual paradigm for HCI, said: simple things should be
the type of search (basic, advanced) the language the unit of search (paragraphs, sections, documents as full text, and section titles, document titles, and author as metadata elements) 21/45
a broad search that identify virtually all the relevant docs is said to have high recall one in which virtually all retrieved docs are relevant has high precision
24/45
casting a broad query to be sure of retrieving all relevant material, albeit diluted with many irrelevant answers addressing a narrow one, where most retrieved docs are of interest but others slip through the net because the query is too restrictive
25/45
31/45
32/45
34/45
see: http://nzdl.sadl.uleth.ca/cgi-bin/library?e=q-00000-00---off-0gberg--00-0--0-10-0---0--0prompt-10---4-----dtt--0-1l--11-en-50---20-about-digital--00-0-1-00-0-0-11-1-0utfZz-800&a=p&p=preferences
35/45
with wli:
only a doc-level index is needed it takes a lot of time to respond because many docs might have to be scanned, especially for common words in the query the index is significantly larger response time is much smaller punctuation and white spaces can be indexed as well 36/45
37/45
38/45
see: http://m-w.com/dictionary/search
39/45
40/45
41/45
42/45
Phrase browsing
people want to browse information collection based on their subject matter that kind of browsing is well supported by displays based on hierarchical classification metadata that is associated with each document but manual classification is expensive and tedious for large document collections to address this issue, one can build topical browsing interfaces based on phrase metadata, where the phrases have been extracted automatically from the full text of the documents themselves representative key phrases can be chosen 43/45