Professional Documents
Culture Documents
I. Motivation
Mobile technologies represent an opportunity for improving the effectiveness and efficiency of
generating economic value added in many areas. One of these area is statistical data collection,
which involves costly operations such are censuses and surveys. There were many experiences of
using mobile technologies in statistical operations that have had more or less success (Brown,
Vannable and Eriksen, 2008; Couper, 2005; Reades, Calabrese, Sevtsuk, and Rait, 2007; Stopher,
2009; Vijayaraj and DineshKumar, 2010). One of these method of using new technologies in
statistical operations is the Computer Assisted Personal Interview (CAPI) which has been used for
the last two decades and for which various levels of success were reported (Gravlee , 2002; Baker,
Reginald, Bradburn, and Johnson, 2005). One of the main weaknesses of all the methods that have
ben proposed so far is the cost of the device that should be given to the enumerator. Once the cost
of one device is multiplied by the number of enumerators, the total cost of the survey is still high
and reduce the comparative advantage with traditional survey methods. However, another way of
using mobile technologies for statistical operations has been suggested by ECA management that
would solve that problem of the cost of the mobile devices: the idea is to design approaches that use
the mobile devices already owned by the enumerators in order to avoid the cost of distributing
mobile devices to each enumerator. It is in line with this idea that this paper is investigating the
possibility of sending statistical questionnaires using regular SMS. SMS technology has been
chosen because it is one technology that is available on every mobile phone and the cost of sending
an SMS message is relatively low.
II. Shannon information theory and application to sending statistical questionnaire by SMS
According to Shannon information theory (Shannon, 2001), the information capacity of a
transmission channel can be quantified using the following formula:
H = log S n = n log S
where S is the number of possible symbols that can be used for the communication, and n is the
number of symbols in the transmission. In the case of a SMS message, lets call
number of characters we can use in the SMS message and
nb _ char
the
of the SMS message. The capacity of the SMS message is then given by the formula:
For one question with p possible answers, the maximum information content is:
1 log(n) = log(n)
q log(n)
For q questions with n possible answers ecah, the maximum information content is:
For a general questionnaire with p questions indexed with I varying from 1 to q, and
ni
possible
is:
answers
for
the
question
I,
the
maximum
information
content
n
i =1
log(ni ) = log( ni )
For
i =1
identical
questionnaires,
the
maximum
information
content
is:
q i =1 log(ni ) = q log( ni )
n
i =1
Which gives:
q = int(
log( ni )
i =1
160 log(85)
q = int(
) = 11
20
log(20 )
It means it is possible to send up to 11 questionnaires with one single SMS message using the rigth
compression algorithm. But this is just a theoretical limit and we need to investigate feasible
practical implementations.
n1 n2 nq
85x85x.x85 (m times). In this case the coordinates of the questionnaire (or combined
questionnaires) on each dimension will be represented by a single character and the answer is
converted into a string of m characters that can be sent by SMS. An idea of how this transformation
is done in two dimensions is given by the figure 1 below.
Usding this method we can obtain a reversible function that will reduce the q dimensions of the
answers set into m dimensions with one character coresponding to the coordinate on each of the m
dimensions, wich immeditaly gives the chain of characters to be sent by SMS.
The drawback of this approach is that it requires relatively complex calculations both at the coding
and decoding side and, threfore, in order for the codification to be well done by the emumerator, a
special coding program has to be installed on her/his mobile phone.
We are now going to investigate less optimal approaches that do not require any special equipment
except the mobile phone and a good training of the enumerator.
Figure 1: Converting an hypercube with dimensions greaters than 85 to an hypercube with all dimensions lower than 85
in a reversible manner (2 dimensions case).
Coded
string
10
11
12
13
14
15
16
1/
1:
1;
1<
1=
1>
1?
1@
1{
Number
171
172
173
174
175
176
177
247
248
249
250
251
252
253
254
255
Coded
string
20
21
22
23
24
25
26
2/
2:
2;
2<
2=
2>
2?
2@
2{
Number
1276
1277
1278
1279
1280
1281
1282
1352
1353
1354
1355
1356
1357
1358
1359
1360
Coded
string
f0
f1
f2
f3
f4
f5
f6
f/
f:
f;
f<
f=
f>
f?
f@
f{
New
code
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Character
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
A
Description
Zero
One
Two
Three
Four
Five
Six
Seven
Eight
Nine
Lowercase a
Lowercase b
Lowercase c
Lowercase d
Lowercase e
Lowercase f
Lowercase g
Lowercase h
Lowercase i
Lowercase j
Lowercase k
Lowercase l
Lowercase m
Lowercase n
Lowercase o
Lowercase p
Lowercase q
Lowercase r
Lowercase s
Lowercase t
Lowercase u
Lowercase v
Lowercase w
Lowercase x
Lowercase y
Lowercase z
Uppercase A
37
38
39
40
41
42
43
B
C
D
E
F
G
H
Uppercase B
Uppercase C
Uppercase D
Uppercase E
Uppercase F
Uppercase G
Uppercase H
0
1
2
3
4
5
6
7
8
9
New code
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
Character
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
!
"
#
$
%
&
'
(
)
*
+
,
.
/
:
;
<
=
81
82
83
84
85
>
?
@
{
}
Description
Uppercase I
Uppercase J
Uppercase K
Uppercase L
Uppercase M
Uppercase N
Uppercase O
Uppercase P
Uppercase Q
Uppercase R
Uppercase S
Uppercase T
Uppercase U
Uppercase V
Uppercase W
Uppercase X
Uppercase Y
Uppercase Z
Exclamation mark
Double quotes (or speech marks)
Number
Dollar
Procenttecken
Ampersand
Single quote
Open parenthesis (or open bracket)
Close parenthesis (or close bracket)
Asterisk
Plus
Comma
Hyphen
Period, dot or full stop
Slash or divide
Colon
Semicolon
Less than (or open angled bracket)
Equals
Greater than (or close angled
bracket)
Question mark
At symbol
Opening brace
Closing brace
To facilitate the conversion by the enumerator, the one or two characters will be shown in front of
each option for each question on the questionnaire. In order for the enumerator to generate the
chain of characters to send by SMS, all what is needed is to concatenate the characters chains in the
order of the questions. The character chain will be decoded at the reception side, where basic
checks can be run immediately with a SMS sent back to the enumerator if an error is found.
Simple example with a questionnaire of a few questions:
Identification of the household:
Possible answers: 1 to 2000.
Current answer: 211.
Coresponding string: 2E.
Identification of the individual in the household:
Possible answers: 1 to 20.
Current answer: 3.
Coresponding string: 3.
Sex:
Possible answers: 1 or 2 or 3 (for n.a.)
Current answer: 2.
Coresponding string: 2.
Age:
Possible answers: 1 to 120.
Current answer: 93.
Coresponding string: 18.
Education level:
A few possible answers (ISCED) with coresponding strings:
24 Lower secondary education:
241 Insufficient for level completion or partial level completion, without direct access to upper
secondary education (2)
242 Sufficient for partial level completion, without direct access to upper secondary education
7
(2*)
243 Sufficient for level completion, without direct access to upper secondary education (2+)
244 Sufficient for level completion, with direct access to upper secondary education (2,)
251 Insufficient for level completion or partial level completion, without direct access to upper
secondary education (2=)
252 Sufficient for partial level completion, without direct access to upper secondary education
(2>)
253 Sufficient for level completion, without direct access to upper secondary education (2?)
254 Sufficient for level completion, with direct access to upper secondary education (2@)
Figure 2: The simulated electronic questionnaire with data entered using the following pattern: #Question
number*Selected option#Question number*Selected option
Figure 3: The simulated electronic questionnaire working on the encoding of the data.
10
This questionnaire has only the basic encoding functionality, but it can be extended with an SMS
sending module to directly send the coded string without using another mobile device, an
mechanism to directly transfer the coded string to the mobile phone of the enumerator (cable or
bluetooth), additional memory to store the manuals and dictionaries for consultation by the
enumerator, and so on. The possibilities are only limited by the cost of the additional module. The
electronic circuit and the code that drives the microcontroler are available on demand.
IV. Conclusions
Sending structured information by SMS has tremendous applications, not only for surveys but also
for administrative data collection, civil registration (registering births by SMS), etc. It is an
approach that is very cost effective because it does not require any special infrastructure on the side
of the one collecting the data. It allows automatic checking of the answers in a centralized way and
can immeditaly inform the enumerator of errors to correct via SMS. It can be a very efficient way of
leveraging mobile technologies for statistical operations in developping countries. What is needed
is to develop the programs at receiving end that can receive and decode the strings, as well as check
the answers and send notifications to the enumerator in case error are detected.
11
V. Short bibilography
Baker, Reginald P., Norman M. Bradburn, and A. Johnson. "Computer-assisted Personal
Interviewing: An experimental evaluation of data quality and costs." JOURNAL OF OFFICIAL
STATISTICS-STOCKHOLM- 11 (1995): 415-434.
Brown, Jennifer L., Peter A. Vanable, and Michael D. Eriksen. "Computer-assisted self-interviews:
A cost effectiveness analysis." Behavior research methods 40.1 (2008): 1-7.
Couper, Mick P. "Technology trends in survey data collection." Social Science Computer Review
23.4 (2005): 486-501.
Gravlee, Clarence C. "Mobile computer-assisted personal interviewing with handheld computers:
The Entryware System 3.0." Field Methods 14.3 (2002): 322-336.
Reades, J., Calabrese, F., Sevtsuk, A. and Ratti, C. "Cellular census: Explorations in urban data
collection." Pervasive Computing, IEEE 6.3 (2007): 30-38.
Shannon, Claude Elwood. "A mathematical theory of communication." ACM SIGMOBILE Mobile
Computing and Communications Review 5.1 (2001): 3-55.
Stopher, Peter R. "Collecting and processing data from mobile technologies." Bonnel, P.; LeeGosselin, M.; Zmud, J (2009): 361-391.
Vijayaraj, A., and P. DineshKumar. "Design and Implementation of Census Data Collection System
Using PDA." International Journal of Computer Applications 9.9 (2010).
12
Symbol
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
!
"
#
$
%
&
'
(
)
*
+
,
.
/
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
Description
Space
Exclamation mark
Double quotes (or speech marks)
Number
Dollar
Procenttecken
Ampersand
Single quote
Open parenthesis (or open bracket)
Close parenthesis (or close bracket)
Asterisk
Plus
Comma
Hyphen
Period, dot or full stop
Slash or divide
Zero
One
Two
Three
Four
Five
Six
Seven
Eight
Nine
Colon
Semicolon
Less than (or open angled bracket)
Equals
Greater than (or close angled bracket)
Question mark
At symbol
Uppercase A
Uppercase B
Uppercase C
Uppercase D
Uppercase E
Uppercase F
Uppercase G
Uppercase H
Uppercase I
Uppercase J
Uppercase K
Uppercase L
Uppercase M
Uppercase N
Uppercase O
Uppercase P
Uppercase Q
Uppercase R
Uppercase S
13
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
T
U
V
W
X
Y
Z
[
\
]
^
_
`
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
{
|
}
~
Uppercase T
Uppercase U
Uppercase V
Uppercase W
Uppercase X
Uppercase Y
Uppercase Z
Opening bracket
Backslash
Closing bracket
Caret - circumflex
Underscore
Grave accent
Lowercase a
Lowercase b
Lowercase c
Lowercase d
Lowercase e
Lowercase f
Lowercase g
Lowercase h
Lowercase i
Lowercase j
Lowercase k
Lowercase l
Lowercase m
Lowercase n
Lowercase o
Lowercase p
Lowercase q
Lowercase r
Lowercase s
Lowercase t
Lowercase u
Lowercase v
Lowercase w
Lowercase x
Lowercase y
Lowercase z
Opening brace
Vertical bar
Closing brace
Equivalency sign - tilde
Delete
Source : http://www.ascii-code.com/
14