Professional Documents
Culture Documents
Table of Contents
Foreword 0
Part I Introduction 2
Part II ZoloPages 3
1 ZoloPages Extractor
...................................................................................................................................
(ZPG) 3
Extractor - Surf..........................................................................................................................................................
and Collect 4
Extractor - Select
..........................................................................................................................................................
and Save 5
Settings .......................................................................................................................................................... 8
Index 0
1 Introduction
ZoloPages is a Name extractor, Address extractor, Phone extractor, Fax extractor, URL extractor,
Email extractor for any web page with deployed data. Fully customizable, it enables you to develop
your own ZoloMasks to carry out tedious data mining tasks, such as (but not limited to) retrieving data
from the white pages or pink, yellow, green pages in almost any country in the world today. You can
grab data from the web in no time! ZoloPages can then save the data you've selected in Microsoft
Excel or Microsoft Outlook format for future use.
Visit http://www.zolopages.com for more recent versions of this software and manual/help file.
ZoloPages Extractor 3 - allows the capture of data from online services. The extractor is supplied
with no guarantee implied, and no templates except the one in the picture above. If you're looking for
specific ZPG templates fr your own country, you will have to resort to third party web sites (including,
but not limited to: http://www.zolomask.com in Asia).
ZoloMask Editor 10 - allows the edition and creation of filters specific to some determined web
services
2 ZoloPages
2.1 ZoloPages Extractor (ZPG)
Visit http://www.zolopages.com for more recent versions of this software and manual/help file.
Select a web source. The corresponding web page will be displayed in the integrated browser.
Check AUTOMODE if you wish to capture more than one page automatically.
Then click on the CAPTURE button when you are satisfied with the data displayed on the page.
ZoloPages is now ready to capture data. It will save that data in the second tab:
Select and Save 5 .
This page holds the data captured from the web page, stored in a regular grid. Please not (below) that
no data has been selected at this point. No checkbox is actually selected.
Select the data you wish to save to disk: either individually, or several at a time by right-clicking on the
SELECT button.
With these menu items you can thus SELECT, UNSELECT all records. You can also
inverse your selection and even delete part or all of it. e
With these menu items you can save to various common formats, including XML, HTML,
and even ZoloPages proprietory ZPD.
The most common formats are obviously MS Outlook (individual Contacts entries), CSV
(text-based) for Excel and Access, Excel itself, and Word. e
Click on the item of your choice. The following dialog will pop up.
Give the new file the name of your choice and save it where you see fit. That's it!
2.1.3 Settings
This part deals only wit the way the data will be saved in MS Outlook.
1. NAME FORMAT
· Keep order the same: ANDREW FINNEGAN will be stored as ANDREW FINNEGAN in Outlook
Contacts.
· Invert Name+First Name: BAILLY PAUL will be stored as PAUL BAILLY in Outlook Contacts.
2. ADDRESS DISPLAY
The rest of the address: 63120, Sharjah will still be stored in the Zip Code and City fields.
3. CONTACT TYPE
A personal contact will be saved with the HOME PHONE and HOME ADDRESS selections, whereas...
A business contact will be saved with the BUSINESS PHONE and BUSINESS ADDRESS selections.
4. CONFIRMATION
An Office/Outlook confirmation screen will pop up so that you can verify the information entered by
ZoloPages in your Contacts, and make sure the Name format, Address Display and Contact Type
are correct.
WARNING: This is not ideal when you save multiple contacts, as many contact screens will pop up
simultaneously. Reserve this for single saving or for double checking the format of one record
before saving all the others!
WARNING: You must think carefully before saving multiple contacts to Outlook!
What you need to do is work with the HTML code for each page/data template, and isolate the
elements that constitute a data entry on any given page.
1.Navigate to a any web page you'll need to capture data from (first tab in left pane). Enter URL and
press ENTER.
The HTML code for the current page is displayed. If there is too much HTML data, simply isolate one
address, and press LOAD HTML again. Only the browser selection will appear.
2. Isolate the boundary elements for one entry. Use the FIND button (CTRL+F) if necessary, as
some HTML pages are sometimes not very legible.
You can select text from the HTML View and drop it into the various fields below. If you press CTRL
while dragging, your text will not replace the current one. On the contary it will be added to it and the
marker "|" (or) will be inserted between the two.
Then go to the ONE ENTRY pane on the left. One entry only should be displayed there, as opposed to
the whole HTML code in the WHOLE DOCUMENT pane.
Then go to the second pane, called TEST FILTERS. Verify that your base markers are all correct
before proceeding to the phone, fax, http and email field definitions.
Then SAVE your ZOLOMASK by clicking on the rightmost button (the disk) in the first toolbar.
Give it a specific name. The extension .ZPG will be added to the filename.
Try to follow the standards edicted by ZoloMask:
1.two-letter-country-code
2.followed by a dash
3.full descriptive name
Example:
IN-PagesRojes: Pages Rojes service of India (doesn't exist, of course)
Visit http://www.zolopages.com for more recent versions of this software and manual/help file.
ZoloMask uses some basic hierarchical markers to isolate data from a web page.
<Add_Start>
<Name_Start>
Jonas Doe
<Name_End>
<Street_Start>
Villavägen 27, 27525 Öreby
// Also contains determiners for CITY and ZIP CODE
<Street_End>
// Other data: phone, fax, tollfree, http and email
<Add_End>
In your edition grid, these base markers can be defined as shown below.
A number of wild cards are available to help you work with variable elements in the HTML
code.
1. §
--> replaces a maximum of 25 characters (from 1 up to 25), including line breaks. Useful when
table cells have various colors from one line to the next (but not only).
Example:
2. ~
--> goes back 30 characters, instead of capturing 30 characters after the expression that
follows.
Examples:
fax: §
captures
fax: (215)1247847879
~(fax)
captures
(215)1247847879 (fax)
Example:
NW?11?N? 1N?N?
captures
E17 3HX
SW4 7AA
SW18 4DW
EC1A 9LH
etc.
|
4. = OR
--> placed between two alternate expressions looks for both in the HTML code
Example:
<table width="365">|</tr></td>
corresponds to both
<table width="365">
and/or
</tr></td>
5. °
--> replaces a maximum of 5 characters (from 1 up to 5), including line breaks. Useful when
table cells have various colors from one line to the next (but not only).
Example:
<table width="°">
corresponds to both