Professional Documents
Culture Documents
Datacamp
User’s and Administrator’s Guide
January 2010
info@knowerce.sk
www.knowerce.sk
knowerce|consulting
Document information
Creator Knowerce, s.r.o.
info@knowerce.sk
www.knowerce.sk
Document Restrictions
Copyright (C) 2009 Knowerce, s.r.o., Stefan Urbanek
This document is distributed under Creative Commons License: Attribution-Noncommercial-Share
Alike 3.0
2010-02-02 1.2 added record status, appendices (search engine, external Stefan Urbanek
sources)
Offer
info@knowerce.sk
2
knowerce|consulting
Contents
Introduction
.....................................................................................................................................................................................5
Project Page and Sources
........................................................................................................................................................6
Support
6
Data Users
.......................................................................................................................................................................................7
Main Screen
.....................................................................................................................................................................................8
Session bar
8
Data Browsing
...............................................................................................................................................................................9
Data Catalogue
9
Dataset Display
10
Record Details
11
Adding to Favourites
11
Sharing
11
Searching
........................................................................................................................................................................................12
Global Search
12
Dataset Advanced Search
13
User’s Profile
................................................................................................................................................................................14
Profile
14
Favourites
14
Comments
14
Application Programming Interface
.................................................................................................................................16
API key
16
Requests
16
Errors
16
API Command Line Tool
17
Data Management
.....................................................................................................................................................................18
Datasets and Records
.............................................................................................................................................................19
Record Status
19
Record Management
...............................................................................................................................................................21
Create Record
21
Edit Record
21
Record Status
21
Import Data
.................................................................................................................................................................................23
File Selection
23
Field Mapping
23
Offer
info@knowerce.sk
3
knowerce|consulting
Administration
.............................................................................................................................................................................25
Dataset Management
..............................................................................................................................................................26
Dataset Categories
26
Create Dataset
27
Inspecting and Edit Dataset Description
28
Create Dataset Field
29
Edit Dataset Field
31
Derived Fields
31
Data Format
31
User Management
.....................................................................................................................................................................33
Create User
33
Edit User
34
Roles and Rights
34
Appendices
...................................................................................................................................................................................36
A. Dataset Implementation
.................................................................................................................................................37
Datasets
37
Fields
37
Summary
37
B. External Sources and ETL
.............................................................................................................................................38
C. Search Engine
.......................................................................................................................................................................39
Predicates
39
D. API Shell Script
....................................................................................................................................................................40
Offer
info@knowerce.sk
4
knowerce|consulting
Introduction
Datacamp is a Web application for publishing, searching and managing data in form of datasets. Each
chapter of this guide presents major features and examples how to use the application. The guide is
split into three sections: guide for data users, guide for data managers and guide for application
administrators.
Offer
info@knowerce.sk
5
knowerce|consulting
Wiki Documentation:
http://wiki.github.com/Stiivi/datacamp/
Support
General Discussion Mailing List
http://groups.google.com/group/datacamp
Offer
info@knowerce.sk
6
knowerce|consulting
Data Users
This section describes how to browse data, search for data, discuss about data.
Offer
info@knowerce.sk
7
knowerce|consulting
Main Screen
b
a
c
Session bar
a b c d
Offer
info@knowerce.sk
8
knowerce|consulting
Data Browsing
Data Catalogue
When you open the Data Catalogue you will see list of all datasets that are published. The datasets
are grouped by dataset category.
Offer
info@knowerce.sk
9
knowerce|consulting
Dataset Display
On the dataset page you see:
■ get more information about the dataset, such as data provider, update frequency, data sources
Offer
info@knowerce.sk
10
knowerce|consulting
Record Details
Adding to Favourites
You can add any record or dataset into list of your favourites by pressing the add to favourites button
under dataset or record description:
You will be asked to add a short note to the record or dataset you are about to add to your
favourites.
Sharing
Any dataset or record can be shared by clicking on the Share link under a dataset or record
description:
Offer
info@knowerce.sk
11
knowerce|consulting
Searching
There are two ways how to search for data: global search and advanced dataset search. Using the
global search all datasets and fields are being searched for given search query. Advanced dataset
search allows you to specify search criteria more precisely, but you are limited to one dataset only.
Global Search
To start searching through all datasets you can use either front page search:
or you can type your query into the search filed that is present in the right side of the menu bar all
the time:
Search Query
Database is searched for all words or expressions you type in the search field. Search examples:
■ john smith → searches for both words john and smith
■ public television → searches for two words: public and television, matches all records containing
both words in any of the fields, might be separated
■ “public television” → searches for whole phrase, matches only records that contain exact phrase
as a part of a field
To exclude a word from search query add minus sign in front of a word or phrase:
■ john -smith → search for records which contain john, but not smith
Pattern matching
To match partial words, such as prefixes or suffixes use asterisk * symbol to denote missing part of a
word:
■ *tech → matches all fields that end with tech, such as microtech, macrotech but not technology
■ tech* → matches all fields that start with tech, such as technology, but not microtech or
macrotech.
Advanced Query
Advanced users might want to refine their search by using advanced queries:
■ dataset:procurements → search only in datasets containing word procurements in their name
■ -dataset:donations → exclude datasets that contain word donations in their name
■ field:name → search only in fields that have word name in their title or identifier
Offer
info@knowerce.sk
12
knowerce|consulting
■ -field:city → exclude search in fields that have word city in their title or identifier
a b c d
First you have to select a field you want to search in. Then select operator, for example for text fields
the options are:
You can add more conditions by pressing the add button. To remove a condition press remove
button.
Offer
info@knowerce.sk
13
knowerce|consulting
User’s Profile
Profile
The profile tab is used to change basic information about the user, change display name, email
address or password.
Favourites
You can browse records you have marked as favourites in the Favourites tab.
To display the favourite just click on the record reference. To show dataset where the record is
contained click on the dataset name. To delete the favourite click on the trash can icon.
Comments
You can see all comments you have given to records or datasets. If you view profile of another user
you see all comments that he has given.
Offer
info@knowerce.sk
14
knowerce|consulting
Offer
info@knowerce.sk
15
knowerce|consulting
For example:
http://my-datacamp.org/api/dataset_description?api_key=abc123&dataset_id=1
API key
You get your API key in the web application: go to your profile (top-right corner of the page) and
select API tab. There you have your API key. If you thing that your API key is being abused by someone
else, you might generate another key.
Requests
Request Arguments Description Format
dataset_description dataset_id dataset information, list of dataset fields and Ruby XML
their properties
Errors
If an error occures during request, error reply is returned in XML format:
Offer
info@knowerce.sk
16
knowerce|consulting
HTTP
Code Description status
access_denied Invalid API key or key owner has no access to requested method 401
or object
object_not_found error is replied only when concrete id of an object is expected and the object
with provided id does not exist in the database. There is no error reply when one is searching for an
object using search criteria and no object was found – searching operation succeeded and found
nothing.
Offer
info@knowerce.sk
17
knowerce|consulting
Data Management
This section is about creating and editing records, importing data from a file.
Offer
info@knowerce.sk
18
knowerce|consulting
Record Status
Records are kind of “live” entity that might change over time. Datacamp has tools to manage status of
records, which can be compared to status of a service customer or status of an article in content
management system. There are five record statuses:
loaded data managers record was loaded to datastore by ETL process. ETL process
uses this status to know which records were actually
imported to be able to do additional finalisation of new
records. This status should never be seen in the application.
new data managers manually created record or record created by CSV import.
Record requires review before publishing.
suspended (hidden) data managers there are some issues with the records that might confuse
potential viewers or there are other reasons for not
publishing the records, such as quality issues or trust of the
source
deleted (closed) data managers, requires record has no further use in the database, either because of
explicit filter to list redundancy or relevancy; or it might be obsoleted
destroyed no one pseudo-status. the record does not exist in database any
(not actual status) more and all references and dependencies to this record
were removed
Records should be persistent and should not be destroyed (deleted from database) only when really
necessary. Reasons for destroying record might be failed loading, multiple import of the same file or
something serious.
Offer
info@knowerce.sk
19
knowerce|consulting
Following diagram shows record statuses and possible transitions between the statuses:
ETL finalize
loaded new
check/publish
publish
active suspended
(published) (hidden)
suspend
delete
delete
undelete
deleted
(closed)
destroy
Offer
info@knowerce.sk
20
knowerce|consulting
Record Management
Create Record
To create a record, you do:
1. open a dataset you want to add record to
2. click on create record at the top of the dataset display:
Edit Record
To edit a record you have to:
1. find and open the record you want to edit
2. click on Edit in the top right corner of the record display:
Record Status
Each record has a publishing status which can be one of these:
■ new
■ published (active)
■ hidden (suspended)
■ deleted (closed)
Records should not be deleted from the database, only in exceptional cases, such as:
■ incorrectly imported
■ redundantly imported
■ un-intentional redundancy
Offer
info@knowerce.sk
21
knowerce|consulting
Status Overview
Status Description Visible to
new record was just created, either manually, by importing from a file or by data managers
a background loading process
hidden (suspended) records that are not intended to be published because of quality data managers
issues, controversy, redundancy, uncertainty or any other reason
deleted (closed) records removed from the database data managers, when
explicitly requested
Offer
info@knowerce.sk
22
knowerce|consulting
Import Data
To import data:
1. open data dictionary
2. click on import in the menu bar:
File Selection
1. select dataset description to which you want import new data:
3. You can optionally specify title of the file and source, for the record
4. Chose file template:
currently there are two templates available: plain CSV and CSV with more header rows, for
example one might contain human readable column titles and the other field identifiers for
automatic field matching
5. You can optionally specify format of CSV file: separator of columns, number of header lines
Field Mapping
After confirming the file you want to import, mapping screen will be displayed:
Offer
info@knowerce.sk
23
knowerce|consulting
b c
(a) change settings of import: specify another dataset, change file format
(b) guess field mapping from file headers (see below)
(c) revert to predefined column mapping
Columns in the file are matched to the dataset based on settings specified in Dataset Description →
Import Settings. You can override the mappings by specifying dataset fields from field lists.
If you want to get field mapping from file, you can guess it by pressing button. This
action will try to match file header to dataset field identifiers.
Press if you have messed up the mappings and want to revert to predefined
dataset mapping.
Click if you are satisfied with file to dataset mapping to import records from file into
specified dataset.
Technical note: all imports are being stored in database, even there is no user interface for it. All records
refer to a batch they come from, therefore you can identify which records came from which file.
Offer
info@knowerce.sk
24
knowerce|consulting
Administration
This section is for application administrators and is about creating new datasets and fields, managing
users, assigning user rights and roles.
Offer
info@knowerce.sk
25
knowerce|consulting
Dataset Management
Datasets are being management in the Data Dictionary section:
On this page you see descriptions for all datasets in the database.
Dataset Categories
To create a category, press . You will be asked for a title for the new category:
Offer
info@knowerce.sk
26
knowerce|consulting
To edit a category name or remove a category hover mouse over category name to show category
actions and press the desired action:
Note: If you remove category, the datasets will not be removed. They will become uncategorised.
Create Dataset
To create a new dataset, go to the data dictionary and chose New Dataset from the menu:
Offer
info@knowerce.sk
27
knowerce|consulting
You might also edit dataset description when you have dataset open:
a d
Offer
info@knowerce.sk
28
knowerce|consulting
Field Descriptions
This tab contains list of all fields in the dataset. For more information, please read section about field
descriptions.
Information
This tab shows basic information about the dataset:
Offer
info@knowerce.sk
29
knowerce|consulting
Offer
info@knowerce.sk
30
knowerce|consulting
Derived Fields
You might have fields that are derived from other fields, for example you might combine name and
surname in one field. To create a derived field, check “Derived” checkbox and write derive expression:
The derive expression is SQL expression. At this moment it depends on the SQL server used for
storing the datasets. You might use other field’s identifiers in the expression. You might not use
derived fields to derive other fields at the moment.
Example: Create derived field named “Full Name” and put derive expression:
CONCAT(name, ‘ ‘, surname)
Data Format
You might specify, how the data are being formatted in the application. This functionality is very similar
to the functionality in spreadsheet applications. Data format does not affect the actual stored data,
only their presentation to the user.
Offer
info@knowerce.sk
31
knowerce|consulting
Number Number with localised thousand and decimal none 123456,789 123 456,79
separators
Currency Number with localised thousand and decimal currency symbol 123456,789 123 456,79 Sk
separators
Size in bytes Value is converted to human readable size in none 12345678 11,77 mb
bytes with number order adjusted.
Offer
info@knowerce.sk
32
knowerce|consulting
User Management
Users are being managed through Settings → Users:
Create User
1. Open user management page
2. Click on “New User” button
3. Fill-out form:
4. Set-up user roles and rights (see section about Roles and Rights)
5. Confirm new user data
Offer
info@knowerce.sk
33
knowerce|consulting
Edit User
1. open User Management page
2. click on a user
3. change user properties
4. submit changes
Offer
info@knowerce.sk
34
knowerce|consulting
Roles
Datastore User
Right Category Right Data Editor Manager Power User Manager
Super-user
There is special kind of user named super-user. Super-user does not need to have any rights nor roles
assigned, he is allowed to do and access anything in the application. Actions available only to suer-
users:
■ make another user a super-user
■ change dataset or field identifier
■ destroy record
Offer
info@knowerce.sk
35
knowerce|consulting
Appendices
Additional information, such as technical notes and concepts.
Offer
info@knowerce.sk
36
knowerce|consulting
A. Dataset Implementation
Datasets
Currently datasets are being created as tables in relational database in datastore schema. The table
name is constructed from prefix ds_ and dataset identifier. For example, dataset with identifier
public_procurements will have table name ds_public_procurements.
Metadata for dataset records are stored in the same table as records. This might change in the future.
Fields
Fields are implemented as relational table columns
Summary
Object/Concept Implementation Reference
record table row dataset unique record id number (in _record_id column)
record metadata dataset table columns metadata identifier in the same dataset table and same row as
the record
Important note: Implementation of datasets might change in the future, therefore you should not rely
on this structure and use Datastore API instead for all dataset and record operations.
Offer
info@knowerce.sk
37
knowerce|consulting
External Sources
foreign database web
Extraction
tables from
temporary tables temporary files
external sources
Transformation
result table
Loading
Dataset Store
Offer
info@knowerce.sk
38
knowerce|consulting
C. Search Engine
Datacamp contains simple predicate based search engine. Each action of searching uses a query
which is composed of predicates.
Search Engine is separate module, therefore it can be replaced as needed, either by more
sophisticated engine or engine that is part of another kind of datastore.
Predicates
Data Type Allowed Predicates
integer greater, less, greater or equal, less or equal, equal, not equal,
date within last days, within last weeks, within last months, greater, less,
greater or equal, less or equal, equal, not equal,
string contains, begins with, ends with, does not contain, matches,
Offer
info@knowerce.sk
39
knowerce|consulting
DATACAMP_BASE_URL=${DATACAMP_BASE_URL:-http://localhost:3000}
DATACAMP_GET_METHOD=${DATACAMP_GET_METHOD:-curl}
function datacamp_request_url() {
METHOD=$1
shift
ARGS="api_key=${DATACAMP_API_KEY}"
if [ $# -gt 0 ]; then
while [ $# -gt 0 ]; do
ARG="$1"
shift
ARGS="${ARGS}&${ARG}"
done
fi
CALL_URL="${DATACAMP_BASE_URL}/api/${METHOD}"
if [ "$ARGS" != "" ]; then
URL="${CALL_URL}?${ARGS}"
else
URL="${CALL_URL}"
fi
echo $URL
}
function datacamp_request() {
URL="$(datacamp_request_url $*)"
case $DATACAMP_GET_METHOD in
curl)
COMMAND="curl";;
wget)
COMMAND="wget -q -O - ";;
*)
echo "ERROR: Unknown get method ${DATACAMP_GET_METHOD}" >&2
exit 1;;
esac
Offer
info@knowerce.sk
40
knowerce|consulting
}
function print_help() {
cat >&2 << EOF
Usage: $0 [-h] [OPTIONS] REQUEST [ARGUMENTS]
Send REQUEST to a Datacamp application and return server reply.
Options:
-b url specify base URL for Datacamp. Default: http://localhost:3000
-k api_key specify API key for accessing Datacamp data
-f format request different format, if available. Options are: xml
-g get_method method of accessing the datacamp: curl (default), wget
Environment variables:
DATACAMP_BASE_URL
DATACAMP_API_KEY
DATACAMP_FORMAT
DATACAMP_GET_METHOD
Example:
$0 version
$0 datasets
EOF
}
if [ $# -eq 0 ]; then
echo "No API method spicified. Use $0 -h for more information." >&2
exit 1
fi
datacamp_request $@
Offer
info@knowerce.sk
41