Professional Documents
Culture Documents
Note that aside from the overall F test and individual t tests, you could also perform
other hypothesis tests. You could, for instance, perform F tests for user- specified hypotheses.
Suppose that from the region variable representing the major island where the household
resides (1 for dhaka, 2 for chittagon, 3 for khulan and 4 for ragfhahi), we create four binary
indicator variables, and we decide to add the regn1 variable (representing whether or not the
household resides in dhaka) in the earlier regression model and jointly test the importance of
the variables famsize and regn1 in the model:
0 with the use
of the t-statistic, here taking the value -1.01, suggests we cannot reject the null hypothesis since
the associated p-value (31.5%_ is rather large (compared to 5%). This means that there is no
linear relationship between assets and sex of head, all other things being equal. For such
individual tests of significance, the t-statistic is the ratio of the estimate of the regression
coefficient to the standard error. The larger the magnitude of the t-statistic, the more
convincing that the true value of the parameter is not zero (as this means the numerator, i.e., the
estimate is quite far in relative terms to zero).
tab region, gen(regn)
quietly regress hassetg educhead famsize sexhead regn1
Introduction to Methods for Research in Official Statistics 108 of 179
The first command above yields the four binary indicator variables from the categorical
variable region. The second command performs the regression without showing the output,
since this regression model is merely an intermediate step toward the final result, i.e., testing
the significance of the famsize and regn1 variables.
Statas estimates command makes it easier to store and present different set of
estimation results:
Statas estimates command makes it easier to store and present different set of estimation
results, say, identifying regression coefficients:
or presenting precision of estimated regression coefficients:
It is assumed that in building the regression model we properly chose explanatory
variables backed by an appropriate theory or practical reasons. Transforming either or both
independent and dependent variables may be needed to facilitate analysis. Choice of
transformation can be based on theory, logic or scatter diagrams. Occasionally, some
transformation may induce linearity. Consider the relationship
quietly reg hassetg educhead
estimates store model1
quietly reg hassetg educhead agehead famsize sexhead
estimates store model2
estimates table model1 model2, stat(r2_a) b(%5.3f) star title("Models of
household assets")
estimates table model1 model2, stat(r2_a) b(%5.3f) se(%5.3f) p(%4.3f)
title("Models of household assets")
Introduction to Methods for Research in Official Statistics 109 of 179
Here, by taking logarithms on both sides, we get:
+ + + =
i i i
X X Y
2 2 1 1 0
ln
When we may not have an idea about what transformation we may need, it may be helpful to
use the Box-Cox transformation as illustrated earlier. Log transformations are useful, as
changes in the log of a variable are roughly the equivalent of one percentage point change in
the variable.
Sometimes, our explanatory variables may have very large correlations. This is known
as multicollinearity. This means that little or no new information is provided by some variables
and this leads to unstable coefficients as some variables are becoming proxy indicators of other
variables. When we add a new variable into the regression that is strongly related to the
explanatory variables already in the model, multicollinearity is suggested when (a) we obtain
substantially higher standard errors; (b) unexpected changes in regression coefficient
magnitudes or signs; (c) nonsignificant individual t-tests despite a significant overall model F
test.
Following http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm ,
we can measure multicollinearity by way of calculating the variance inflation factors (VIFs):
use http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi2, clear
quietly regress api00 acs_k3 avg_ed grad_sch col_grad some_col
Vif
Variable VIF 1/VIF
avg_ed 43.57 0.022951
grad_sch 14.86 0.067274
col_grad 14.78 0.067664
some_col 4.07 0.245993
acs_k3 1.03 0.971867
Mean VIF 15.66
All of these variables measure the education of the parents and the very high VIF values
indicate that these variables are possibly redundant. For example, after you know grad_sch and
Introduction to Methods for Research in Official Statistics 110 of 179
col_grad, you probably can predict avg_ed very well. In practice, we ought to be concerned
when some variables have VIF values greater than 5.
Now, let us try to re-do the analysis but with some variables removed:
regress api00 acs_k3 grad_sch col_grad some_col
Vif
Variable
VIF 1/VIF
col_grad 1.28 0.782726
grad_sch 1.26 0.792131
some_col 1.03 0.966696
acs_k3 1.02 0.976666
Mean VIF 1.15
Notice that the VIF values are now better and the standard errors have been reduced. There are
other ways of addressing multicollinearity apart from dropping variables in the regression. The
researcher may even have the option of not addressing it at all if the regression model is to be
used for prediction purposes only. The interested reader may refer to Draper and Smith
(1998), for example.
As the number of explanatory variables grows, the number of regression models to
compare will also grow considerably. For 10 explanatory variables, we have to compare 1023
models. Instead of choosing the best model among all possible regressions, we may adopt
computational algorithms for regression model variable selection:
Forward Inclusion
Backward Elimination
Forward Stepwise
Backward Stepwise
These can be readily implemented in Stata. For instance, forward-stepwise is implemented
with:
Note: both pr and pe are specified as options for probability to remove and probability to enter,
respectively; also forward option is used else backward stepwise implemented. The above
choice of pr and pe makes it more difficult for a regressor to enter the model than to be
removed from the model. Forward stepwise is a modification of forward inclusion, which starts
with no regressors in the model, and proceeds as in forward inclusion entering first the
use c:\intropov\data\hh, clear
sw, pr(0.10) pe(0.05) forward: reg hassetg educhead agehead famsize sexhead
Introduction to Methods for Research in Official Statistics 111 of 179
explanatory variable with the largest simple correlation with the dependent variable. It is
entered if the F statistic for the estimated regression model with it in exceeds the F value
corresponding to pe=0.05. Then, it enters a second regressor, which has the largest correlation
with the dependent variable after adjusting for the effect on the dependent variable of the first
regressor entered. (Such a correlation is called a partial correlation). The computational
procedure then reassesses included regressors based on their partial F-statistics for exclusion. It
drops the least significant variable from the model if its partial F-statistic is less than the pre-
selected F corresponding to the pr value. Then, it proceeds as in forward inclusion and enters
next regressor, and iterates on the entry and reassessment subroutines.
The computational algorithms (except for all possible regressions) do not necessarily
produce the best regression model. The various algorithms need not yield exactly the same
result. Other software, e.g. R and SAS, use other criteria, e.g. adjusted R2, Cp Mallows,
Akaikes Information Criterion (AIC), or the Schwarz Bayesian Information Criterion (BIC()
rather than probabilities (or F values) to remove or enter variables in the model.
Note that Stata allows a hierarchical option for a hierarchical selection of variables, and
a lock option that forces the first (set of) regressor variable/s listed to be included in the model,
and a hierarchical option.
Aside from multicollinearity, the need to use transformations on the variables, and the
use of stepwise techniques, there are other issues on the use of regression models, e.g. the need
for outlier detection and influence diagnostics, and testing for misspecification. We provide in
the next sub-sections a discussion of logistic regression, and common multivariate tools that
may be of help in official statistics research.
4.3.2. Logistic Regression and Related Tools with STATA
Logistic regression, also called logit analysis, is essentially regression with a binary {0,1}
dependent variable. In a nutshell, it is regression with an outcome variable (or dependent
variable) that is a categorical dichotomic and with explanatory variables that can be either
continuous or categorical. In other words, the interest is in predicting which of two possible
Introduction to Methods for Research in Official Statistics 112 of 179
events are going to happen given certain other information. STATA provides two commands:
logistic and logit for implementing logistic regression:
logit reports coefficients
logistic reports exponentiated coefficients
To illustrate the use of these commands, consider again the hh.dta database, and
suppose that we define households as poor or non-poor through the variable povind depending
on whether their household assets per capita is less than 15,000. Now let us generate a per
capita asset variable, which we will call assetpc, and the povind variable; let us also run a
logistic regression with Statas logistic command:
The results indicate that years of schooling of the head (educhead) is a statistically significant
predictor of povind (i.e., a household being poor), with the Wald Statistic taking a value z =-
5.47, with a very small p-value 0. Likewise, the test of the overall model is statistically
significant, with a Likelihood Ratio chi-squared statistic 32.47 and a corresponding negligible
p-value 0.00
If we to run the logit command:
use c:\intropov\data\hh, clear
gen assetpc=hassetg/famsize
gen povind=1 if assetpc<15000
replace povind=0 if assetpc>=15000
logistic povind educhead
logit povind educhead
use c:\intropov\data\hh, clear
gen assetpc=hassetg/famsize
gen povind=1 if assetpc<15000
replace povind=0 if assetpc>=15000
logistic povind educhead
Introduction to Methods for Research in Official Statistics 113 of 179
which shows that the estimation procedure for logistic regression involves maximizing the
(natural) log of the likelihood function. Maximizing this nonlinear function involves numerical
methods. At iteration 0 of the algorithm, the log likelihood describes the fit of a model
including only the constant. The final log likelihood value describes the fit of the final model
L =0.5760263- -.1489314* educhead
where L represents the predicted logit (or log odds of a household being poor), that is:
L =ln [ Pr(poor) / Pr(nonpoor) ]
An overall likelihood ratio
2
x
test (with degrees of freedom equal to one less than no. of
parameters =2-1) evaluates the null hypothesis that all coefficients in the model, except the
constant, equal zero
2=
- 2( L
i
L
f
=-2 [-357.0325- (-340.79956) ]
)
=32.47
where L
i
is the initial log likelihood and L
f
is the final iterations log likelihood. A convenient
but less accurate test is given by the asymptotic Z statistic associated with a logistic regression
coefficient. Note that the logit Z and the likelihood ratio
2
tests sometimes disagree, though
they do here. The
2
test has more general validity.
In the STATA output, we find a pseudo R
2
statistic,
pseudo R
2
=1- (L
f
/ L
i
=1- (-340.79956)/(-357.0325)
)
=.04546628
which provides a quick way to describe or compare the fit of different models for the same
dependent variable. Note that unlike the R
2
in classical regression, the pseudo R
2
A number of add-on programs in Stata, such as the package spostado, have been
developed for ease in the running and interpretation of logistic regression models. To
download the complete spostado package, merely enter the following in the Stata command
window:
statistics lack
the straightforward explained-variance interpretation.
net from http://www.indiana.edu/~jslsoc/stata/
net install spost9_ado
Introduction to Methods for Research in Official Statistics 114 of 179
Consider now the utilities developed by J . Scott Long & J eremy Freese:
We could also specify that percent change coefficients be listed:
or request the listing of some model fit statistics with the fitstat command:
Another model fit statistic is the Hosmer & Lemeshow goodness-of-fit indicator given by the
lfit command:
After logit, (as in regress), we can issue the predict command. Here, we generate predicted
probabilities.
predict phat
label variable phat "predicted p(poor) "
graph twoway connected phat educhead, sort
listcoef
listcoef, percent
lfit
fitstat
Introduction to Methods for Research in Official Statistics 115 of 179
This generates the logistic curve shown in Figure 4-22.
.
2
.
3
.
4
.
5
.
6
p
r
e
d
i
c
t
e
d
p
(
p
o
o
r
)
0.000 5.000 10.000 15.000
education (years) of hh head
Figure 4-22. Logistic curve.
The goal in logistic regression is to try and estimate the probability that the observation
is from some sub-population 1 ( here, the poor households)
) exp( 1
) exp(
) | 1 (
1 0
1 0
x
x
x
t
t
y p
+ +
+
= =
The logit, or log odds, is defined as the log of the odds ratio, i.e. the ratio of the probability
falling in sub-population 1 to the probability falling in the other sub-population :
=
=
=
) | 1 ( 1
) | 1 (
log ) (
x
x
x
y p
y p
g
which, can be shown as:
x x
t
g
1 0
) ( + =
Thus, as the explanatory variable increases by one unit, the log of the odds changes by
units. Or equivalently, exp
+
=
i j
ij
i j
ij
i j
ij
i
a r
r
MSA
2 2
2
Introduction to Methods for Research in Official Statistics 125 of 179
and subsequently the Kaiser-Meier-Olkin (KMO) index:
+
=
j i
ij
j i
ij
j i
ij
a r
r
KMO
2 2
2
The KMO can be helpful in determining whether or not it will be helpful to perform a factor
analysis on the variables. In particular, you can interpret this index in relation to performing a
factor analysis as follows:
Marvelous: Greater than 0.90
Meritorious: In the 0.80s
Middling: In the 0.70s
Mediocre: In the 0.60s
Miserable: In the 0.50s
Unacceptable: Below 0.5
To illustrate how to perform a factor analysis with STATA, consider the nations dataset and let
us obtain the correlation matrix and the KMO index with the STATA commands corr and
factortest respectively:
These commands yield the results shown in Figure 4-32 and Figure 4-33. It can be observed
that a number of variables are highly correlated; the KMO index indicates that the data is
highly meritorious for performing a factor analysis.
Figure 4-32. The correlation matrix for variables in the nations dataset.
use http://www.ats.ucla.edu/stat/stata/examples/sws5/nations, clear
corr pop-school3
factor pop-school3
estat kmo, novar
Introduction to Methods for Research in Official Statistics 126 of 179
Figure 4-33.Results of estat, kmo command.
In STATA, we can apply a number of factor extraction methods as an option of the
factor command, namely:
pf (principal factor)
ipf (iterated principal factor)
ml (maximum likelihood)
pc (principal components)
factor pop-school3, pf
Introduction to Methods for Research in Official Statistics 127 of 179
We can readily represent the variables (in standard form) in terms of the eight factors and a
unique component. Thus, for instance, according to the STATA output and the factor
analysis model, the (standardized form of the) population variable is:
pop = 0.01722 F1 -0.18702 F2 + 0.37855 F3 + 0.11405 F4 +
0.13907 F5 - 0.05462 F6 + 0.01500F6 + 0.01468F8 +
0.78566
Statisticians prefer principal components and maximum likelihood extraction methods,
while Social Scientists prefer principal factor methods. For the latter, factors are inferred.
Principal components factor (PCF) analysis is a factor analysis extraction method and is
not equivalent to principal components analysis (PCA). The method called singular value
decomposition (SVD) that is used to extract principal components in PCA is used to extract
factors in principal components for factor analysis. SVD is applied to the correlation matrix in
both PCA and PCF. But the two procedures differ in succeeding steps. In PCA, all the
eigenvectors extracted by SVD are used in estimating the loadings of all p variables on all p
principal components. In PCF, only eigenvectors corresponding to m <<p factors are used in
estimating the loadings. The procedure is such that the choice of m affects the estimates.
Principal factor (PF) analysis extracts the eigenvectors from the reduced correlation
matrix.The reduced correlation matrix is like the correlation matrix, but instead of 1s on the
main diagonal, has estimates of the communalities on the main diagonal. Communalities refer
to the variances that the X variables have in common. If the X variables have large
communalities, a factor analysis model is appropriate. The estimate commonly used for the
communality of the i
th
variable is the R
2
Regardless of the method employed, it is important to obtain simple and interpretable
factors. Factor Loadings, which vary from 1.00 to +1.00, represent the degree to which each
of the variables correlates with each of the factors. In fact, these factor loadings are the
correlation coefficients of the original variables with the newly derived factors (which are
themselves variables). Inspection reveals the extent to which each of the variables contributes
to the meaning of each of the factors. You can use the following guide to interpret factor
loadings:
obtained by regressing it on the other X variables.
0.40 important
Introduction to Methods for Research in Official Statistics 128 of 179
0.50 practically significant
Note that: (a) an increase in the number of variables decreases the level of significance; (b) an
increase in sample size decreases the level necessary to consider a loading significant; (c) an
increase in the number of factors extracted increases the level necessary to consider a loading
significant.
To determine the number of factors that we can use, you can be guided by:
Apriori criterion (that is, you can choose the number of factors prior to the
analysis, perhaps guided by some theory or practical convenience)
Latent root/Kaiser criterion which determines how many eigenvalues are
greater than 1; eigenvalues represent the amount of variance in the data
described by the factors.
Percent of variance explained
A scree plot (elbow rule)
In the output of the command earlier, we see that the latent root criterion suggests the use of
two factors. If we would be satisfied with a ninety percent cut off for the proportion of
variation explained, then likewise, we would select two factors. (If we are satisfied with 80%
variation explained, then one factor would suffice).
The scree plot (or elbow rule) is a plot of the number of factors against the cumulative
proportion of variation explained. It is suggested that we find the value where the smooth
decrease of eigenvalues appears to level off. This is the number of factors to be used. You can
generate a scree plot in STATA by entering:
which yields the plot shown in Figure 4-34, and this plot suggests the use of two or three
factors.
greigen, yline(1)
Introduction to Methods for Research in Official Statistics 129 of 179
Figure 4-34. Scree plot.
If the resulting factors are difficult to interpret and name, you may try rotating the
factor solution. A rotational method distributes the variance from earlier factors to later factors
by turning the axes about the origin until a new position is reached. The purpose of a rotated
factor solution is to achieve a simple structure. In STATA, a rotated factor solution can be
obtained (after generating the factor solution) if you enter the command
The default rotation is a varimax rotation, which involves an orthogonal rotation of the factors;
other options include promax, factors, horst.
Once you are comfortable with the factors, you ought to generate the factor scores
(linear composites of the variables). Several methods can be employed for obtaining the
resulting factor score coefficients. The default method in STATA, as in most software, is the
regression factor score.
The following commands generate the factor scores for three factors formed by
standardizing all variables, and then weighting with factor score coefficients and summing for
each factor; list the nations (and their factor scores), as well as generate a scatterplot of the first
two factors.
rotate
factor death-school3, factor(2)
rotate
predict f1 f2
label var f1 "demographics"
label var f2 "development"
list country f1 f2
sum f1 f2
correlate f1 f2
graph twoway scatter f2 f1, yline(0) xline(0) mlabel (country)
mlabsize(medsmall)
Introduction to Methods for Research in Official Statistics 130 of 179
The resulting plot (shown in Figure 4-35) illustrates a clustering of countries according to their
economic development status. Developing countries are on the lower portion of the scatterplot,
with the very poor countries on the lower right portion.
Algeria
Argentin
Australi
Austria
Banglade
Belgium
Benin
Bolivia
Botswana
Brazil
BurkFaso
Burma
Burundi
Cameroon
Canada
CenAfrRe
Chile
China
Colombia
CostaRic
Denmark
DomRep
Ecuador
Egypt
ElSalvad
Finland
France
Ghana
Greece
Guatemal
Guinea
Haiti
Honduras
HongKong
Hungary
India Indonesi
Ireland
Israel
Italy
IvoryCoa
J amaica
J apan
J ordan
Kenya
Kuwait
Liberia
Madagasc
Malaysia
Mauritiu
Mexico
Morocco
Nepal
Netherla
NewZeala
Nicaragu
Niger
Nigeria
Norway
Pakistan
Panama
PapuaNG
Paraguay
Peru
Philippi
Portugal
S_Korea
SauArabi
Senegal
SierraLe
Singapor
Somalia
Spain
SriLanka
Sweden
Syria
Thailand
Togo
TrinToba
Tunisia
Turkey
U_K
U_S_A
Uruguay
Venezuel
W_German
YemenAR
Yugoslav
Zaire
Zambia
Zimbabwe
-
1
0
1
2
3
d
e
v
e
l
o
p
m
e
n
t
-3 -2 -1 0 1
Scores for factor 1
Figure 4-35. Scatterplot of first two factor scores.
4.3.4. Time Series Analysis with STATA
In the previous sections, we looked into the analysis of cross section data. In the current
section, we consider time series analysis. Time series modeling involves the analysis of a
sequence of numerical data obtained at regular time intervals. Unlike the analyses of a random
sample of data that are discussed in the context of most other statistics, the analysis of a time
series is based on the assumption that successive values in the data file represent consecutive
measurements taken at equally spaced time intervals. The following yearly production data
Year: 1997 1998 1999 2000 2001 2002 2003
Prod: 75.3 74.2 78.5 79.7 80.2 75.3 74.2
is a time series. The daily market prices of a certain stock at the market closing over a 6-month
period constitute a time series. Time series that are of interest primarily for economic analysis
include the Gross Domestic Product, the Gross National Product, the Balance of Payments, the
Unemployment levels and rates, the Consumer Price Index and the Exchange Rate Data.
Time series can either be continuous or discrete. Frequently, continuous time series are
discretized by measuring the series at small regular time intervals apart.
Introduction to Methods for Research in Official Statistics 131 of 179
Time may be viewed as just another variable that we observe together with some
endogenous dependent variable, but time it is an ordered variable. Typically, we denote the
time periods by successive integers. We may also consider a continuous time index but here,
we anchor ourselves at some point t=0 in time, and we have a sense of past, present and future.
The future cannot influence the past but the past can, and often does, influence the future.
Time series analysis involves discovering patterns, identifying cycles (and their turning
points), forecasting and control. Analysts are sometimes interested in long term cycles, which
are indicators of growth measures, and so it is helpful if shorter length cycles are removed.
Out-of-sample forecasts may be important to set policy targets (e.g. next years inflation rate
can lead to a change in monetary policy).
Time series models involve identifying patterns in the time series, formally describing
these patterns and utilizing the behavior of past (and present) trends and fluctuations in the time
series in order to extrapolate future values of the series. In model building, it is helpful to
remember the words of time series guru George E.P. Box: All models are wrong, but some are
useful. Several models can be used to describe various features of a time series.
A distinguishing characteristic of a time series is autocorrelation (since data are taken
on the same object over time). Estimated autocorrelations can be exploited to obtain a first
impression of possible useful models to describe and forecast the time series.
To use the time series commands in STATA, it is important to let STATA know that a
time index is found in the data set. Suppose we want to read the Excel file dole.xls (partially
displayed in Figure 4-36).
Figure 4-36. Screen shot of dole.xls
Introduction to Methods for Research in Official Statistics 132 of 179
Note that this data file (representing the monthly total of people on unemployed
benefits in Australia from J anuary 1956 to J uly 1992) is available from the
websitehttp://robjhyndman.com/forecasting/gotodata.htm. Make sure to save only the data, i.e.
remove the second column information, and save the file in csv format, say into the
c:\intropov\data folder
After converting the file into CSV format, you can then read the file, generate a time
variable called n (based on the internal counter _n ), employ the tsset command (to let
STATA know that n is the time index), and get a time plot of the data with graph twoway line
as shown :
which yields the time plot in Figure 4-37.
0
2
0
0
0
0
0
4
0
0
0
0
0
6
0
0
0
0
0
8
0
0
0
0
0
t
s
0 100 200 300 400
n
Figure 4-37. Screen shot of dole.xls
As was earlier pointed out, a time series may be thought of as a set of data obtained at
regular time intervals. Since a time series is a description of the past, forecasting makes use of
these historical data. If the past provides patterns, then it can provide indications of what we
can expect in the future. The key is to model the process that generated the time series, and
this model can then be used for generating forecasts. Thus, while a complete knowledge of the
exact form of the model that generates the model is hardly possible, we choose an approximate
insheet using c:\intropov\data\dole.csv, clear
rename v1 ts
gen n=_n
tsset n
graph twoway line ts n
Introduction to Methods for Research in Official Statistics 133 of 179
model that will serve our purposes of explaining the process that generated the series and of
forecasting the next value of the series.
We may investigate the behavior of a times series either in the time domain or in the
frequency domain. An analysis of the time domain involves modeling a time series as a process
generated by a sequence of random errors. In the frequency domain analysis of a time series,
cyclical features of a time series are studied through a regression on explanatory variables
involving sines and cosines that isolate the frequencies of the cyclic behavior. Here, we will
consider only an analysis in the time domain. Thus, we assume that a time series to consists of
a systematic pattern contaminated by noise (random error) that makes the pattern (also called a
signal) difficult to identify and/or extract. Noise represents the components of a time series
representing erratic, nonsystematic, irregular random fluctuations that are short in duration and
non-repeating. Most time series analysis tools to be explored here involve some form of
filtering out of the noise in order to make the signal more salient and thus amenable to
extraction.
Decomposing a time series highlights important features of the data. This can help with
the monitoring of a series over time, especially with respect to the making of policy decisions.
Decomposition of a time series into a model is not unique. The problem is to define and
estimate components that fit a reasonable theoretical framework as well as being useful for
interpretation and policy formulation. When choosing a decomposition model, the aim is to
choose a model that yields the most stable seasonal component, especially at the end of the
series.
The systematic components in a time series typically comprise trends, seasonal
patterns, and cycles. Trends are overall long-term tendencies of a time series to move upward
(or downward) fairly steadily. The trend can be steep or not. It can be linear, exponential or
even less smooth displaying changing tendencies which once in a while change directions.
Notice that there is no real definition of a trend, but merely a general description of what it
means, i.e., a long term movement of the time series. The trend is a reflection of the underlying
level of the series. In economic time series, this is typically due to influences such as
population growth, price inflation and general economic development.
Introduction to Methods for Research in Official Statistics 134 of 179
When time series are observed monthly or quarterly, often the series displays seasonal
patterns, i.e., regular upward or downward swings observed within a year. This is also not
quite a definition but it conveys the idea that seasonality is observed when data in certain
seasons display striking differences to those in other seasons, and similarities to those in the
same seasons. Every February, for instance, we expect sales of roses and chocolates to go
upward (as a result of Valentines day). Toy sales rise on December. Quarterly unemployment
rates regularly go up at the end of the first quarter due to the effect of graduation. In Figure 4-
38, we see 216 observations of a monthly series (representing a monthly Index from J anuary
1976 December 1993 of French Total Industry Production excluding Construction) that
comprise an upward trend and some seasonal fluctuations
Figure 4-38. Monthly Index of French Total Production excluding Construction
(Jan 76 Dec 93).
When a calendar year is considered the benchmark, the number of seasons is four for
quarterly data, and 12 for monthly data.
Cycles in a time series, like seasonal components, are also upward or downward swings
but, unlike seasonal patterns, cycles are observed over a period of time beyond one year, such
as 2-10 years, and their swings vary in length. For instance, cyclic oscillations in an economic
time series may be due to changes in the overall economic environment not caused by seasonal
effects, such as recession-and-expansion. In Figure 4-39, we see a monthly time series of 110
observations (representing Portugal Employment in Manufacture of Bricks, Tiles and
Construction Products from J anuary 1985 to February 1994). Notice a downward trend,
irregular and short term cyclical fluctuations.
Introduction to Methods for Research in Official Statistics 135 of 179
Figure 4-39. Portugal Employment in Manufacture of Bricks, Tiles and Construction
Products (Jan 85 Feb 94).
These components may be thought of as either adding additively or multiplicatively to
yield the time series. Aside from trends, seasonal effects, cycles and random components, a
time series may have extreme values/outliers, i.e., values which are unusually higher or lower
than the preceding and succeeding values. Time series encountered in business, finance and
economics may also have Easter / Moving Holiday effects, and Trading Day Variation. A
description of these components can be helpful to account for the variability in the time series,
and for predicting more precisely future values of the time series.
In addition to describing these components of a time series, it can be helpful to inspect
whether the data fluctuations exhibit a constant size (i.e., nonvolatile, constant variance,
homoskedastic pattern) or are the sizes of the fluctuations not constant (i.e., volatile,
heteroskedastic)?
It may be also helpful to see if there are any changes in the trend or shifts in the
seasonal patterns. Such changes in patterns may be brought about by some triggering event
(that may have a persistent effect on the time series).
For such descriptive analysis of a time series, it is useful to employ the following tools:
Historical plot a plot with time (months or quarters or years) on the x axis and the
time series values on the y axis;
Multiple time chart (for series observed monthly or quarterly or semestral or for time
periods within the year) superimposed plots of the series values against time with
each year being represented by one line graph;
Table of summary statistics by periods, e.g., months
Introduction to Methods for Research in Official Statistics 136 of 179
Denote a time series by Y
t
A key feature of a time series data (in contrast with other data) is that the data are
observed in sequence, i.e., at time t, observations also of previous times 1, 2, , t-1 are known
but not those of the future. Also, because of autocorrelation, we can exploit information in
the current and past to explain or predict the value of the future.
, which may have been observed at times t=1, 2, , n. Note
that, in practice, the series to be analyzed might have undergone some transformation of an
original time series say, a log transformation in order to stabilize the variance.
Time series analysis often involves lagged variables, i.e., values of the same variable
but from previous times, and leads, i.e, forward values of the series. That is, for a given a time
series Y
t
1 t t
LY Y
=
, you may be interested in a first order lag
representing the original series but one period behind.
To get a first order lag on the time series ts in STATA, either enter
gen tsl1=ts[_n-1]
or alternatively, enter
gen tsl1=L.ts
in the command window. To get the lag 2 (also called the second order lag) variable, enter
gen tsl2=L2.ts
Now to see the original time series, the first order and second order lags:
list n ts tsl1 tsl2
Likewise, there we be interested in the first order lead
1 t t
FY Y
+
=
of a time series Y
t
To get a first order lead on the time series ts in STATA, either enter
representing the original series but one period ahead.
gen tsf1=ts[_n+1]
or alternatively, enter
gen tsf1=F.ts
Sometimes, you may be interested in a first order difference
Introduction to Methods for Research in Official Statistics 137 of 179
1
(1 )
t t t t
Y Y Y L Y
= =
for a given a time series
t
Y .
To get a first order difference on the time series ts in STATA, enter
gen tsd1=D.ts
while the 2nd order difference
2 2
1
(1 )
t t t t
Y Y Y L Y
= =
is obtained with
gen tsd2=D2.ts
Since ts is a monthly series, then the seasonal difference is:
gen tss12=S12.ts
this is not a 12th-difference but rather a first difference at lag 12. This would yield
differences, say between Dec 2003 and Dec 2002 values, Nov 2003 and Nov 2002 values, and
so forth.
Recall that a distinguishing characteristic of a time series is that it is autocorrelated.
The autocorrelation coefficients are the correlation coefficients between a variable and itself at
particular lags. For example, the first-order autocorrelation is the correlation of
t
Y . and
1 t
Y
.;
the second-order autocorrelation is the correlation of
t
Y . and
2 t
Y
; and so forth.
A correlogram is a graph of the autocorrelation (at various lags) against the lags. With
STATA, we can generate the correlogram with the corrgram command. For instance, for the ts
time series, you can get the correlogram with:
corrgram ts, lags(80)
This yields autocorrelations, the partial autocorrelations and Box-Pierce Q statistics
for testing the hypothesis that all autocorrelations up to and including each lag are zero. Partial
autocorrelation is an extension of autocorrelation, where the dependence on the intermediate
elements (those within the lag) is removed. For the Box-Pierce Q statistics, small p values
show significant autocorrelations.
A more refined correlogram can be generated with:
ac ts, lags(80)
Introduction to Methods for Research in Official Statistics 138 of 179
which is shown in Figure 4-40.
Introduction to Methods for Research in Official Statistics 139 of 179
1 1 2 2
1 1 2 2
( ) ( )
( )
t t t
p t p
t t t q t q
y y y
y
= +
+ + +
+ + + +
can be fit into a time series. ARMA(p,q) processes should be stationary, i.e., there should be no
trends, and there should be a long-run mean and a constant-variance observed in the time
series. To help guide us in identifying a tentative ARMA model to be fit on a stationary series,
note the following characterizations of special ARMA models:
AR(1) : ACF has an exponential decay; PACF has a spike at lag 1, and no correlation for
other lags.
AR(2): ACF shows a sine-wave shape pattern or a set of exponential decays; PACF has
spikes at lags 1 and 2, no correlation for other lags.
MA(1): ACF has spike at lag 1, no correlation for other lags; PACF damps out
exponentially.
MA(2): ACF has spikes at lags 1 and 2, no correlation for other lags; PACF - a sine-wave
shape pattern or a set of exponential decays.
ARMA(1,1): ACF exponentially decay starting at lag 1; PACF also has an exponential
decay starting at lag 1.
In other words, we could summarize the signatures of AR, MA, and ARMA models on the
autocorrelation and partial autocorrelation plots with the following table:
Decay Cutoff
Decay
Cutoff
This summary table can be helpful in identifying the candidate models. If we find three spikes
on the partial autocorrelation plot (which may also look like a decay), then we could use try to
fit either an ARMA model or an AR(3) model.
For instance, the differenced series of ts is showing no trend, but the variability is
smaller for the first half of the series than for the second half of the series:
graph twoway line D.ts n
graph twoway line D.ts n if n>220
so we instead look into analyzing only the second half of the series. There are decays of the
autocorrelation and partial autocorrelation, so that, we may want to use an ARMA(1,1) on the
differenced series:
ac D.ts if n>220, lags(40)
pac D.ts if n>220, lags(40)
arima D.ts, arima(1,0,1)
Introduction to Methods for Research in Official Statistics 140 of 179
or equivalently, use an ARIMA(1,1,1) on the original ts series:
arima ts, arima(1,1,1)
Alternatively, you can use the menu-bars:
Statistics Time Series ARIMA models
The underlying idea in Box-J enkins models is that only past observations on the variable being
investigated is used to attempt to describe the behavior of the time series. We suppose that the
time-sequenced observations in a data series may be statistically dependent. The statistical
concept of correlation is used to measure the relationships between observations within the
series.
Often we would like to smooth a time series to break down the data into a smooth and
a rough component:
Data = smooth + rough
Then, we may want to remove the obscurities and subject the smooth series to trend analysis.
In some situations, rough parts are largely due to seasonal effects and removal of the seasonal
effects (also called deseasonalization) may help in obtaining trends. For instance, time series
such as those shown in Figure 4-42 representing deseasonalized quarterly Gross Domestic
Product (GDP) and Gross National Product (GNP) in the Philippines from the first quarter of
1990 to the first quarter of 2000 shows a steady increase in the economy except for the slump
due to the Asian financial crisis. The effect though on GNP is not as severe as the effect due to
GDP due to the income transfers from overseas.
(i) (ii)
Figure 4-42. Gross Domestic Product (i) and Gross National Product (ii) during the first
quarter 1990 up to first quarter 2000.
Introduction to Methods for Research in Official Statistics 141 of 179
Smoothing involves some form of local averaging of data in order to filter out the
noise. Smoothing methods can also provide a scheme for short term forecasting. The
smoothing done above is rather elaborate. We discuss here more simple smoothing techniques
(than seasonal adjustment methods).
The most common smoothing technique is a moving average smoother, also called
running averages. This smoother involves obtaining a new time series derived from the simple
average of the original time series within some "window" or band. The result is dependent
upon the choice of the length of the period (or window) for computing means. For instance, a
three-year moving average is obtained by firstly obtaining the average of the first three years,
then the average of the second to the fourth years, then the average of the third to the fifth
years, and so on, and so forth.
Consider the ts time series from the dole.xls data set. To obtain a 7-month moving
average, you have to use the tssmooth command, specify that it is a moving average with all
the seven points having the same weight and then graph it with:
tssmooth ma ts7=ts, window(3 1 3)
graph twoway line ts7 n, clwidth(thick) || line ts n,
clwidth(thin) clpattern(solid)
This results in Figure 4-43 which shows the rough parts of the original time series being
removed thru the moving average smoothing.
0
2
0
0
0
0
0
4
0
0
0
0
0
6
0
0
0
0
0
8
0
0
0
0
0
0 100 200 300 400
n
ma: x(t)=ts: window(3 1 3) ts
Figure 4-43. Original and smooth versions of ts.
An alternative to entering the commands above is to select from the Menu-bar
Introduction to Methods for Research in Official Statistics 142 of 179
Statistics Time Series Smoothers Moving Average Filter
and consequently enter in the resulting popup window the information shown in Figure 4-44
regarding the new name of the variable (ts7) based on the original variable (or expression to
smooth) ts with equal weights for the current (middle) portion of the series, the lagged and lead
portions of the series.
Figure 4-44. Pop-up window for performing a moving average smoother.
An alternative to moving averages (which provides equal weights to all the data in a
window) is exponential smoothing, which provides most weight to the most recent observation.
Simple exponential smoothing uses weights that decline exponentially; the smoothing
parameter helps in obtaining the iterations for the smoothed data:
1
(1 )
t t t
S X S
= +
based on the original time series
t
X .
You can ask STATA to obtain an exponentially smoothed series from the ts time series by
tssmooth exponential tse = ts
or by selecting on the menu-bar:
Statistics Time Series Smoothers Simple exponential smoothing
This will generate a pop-up window. Replies to the window are shown in Figure 4-45.
Introduction to Methods for Research in Official Statistics 143 of 179
Figure 4-45. Pop-up window for exponential smoothing.
The default option here is to choose the best value for the smoothing parameter
based on the
smallest sums of squared forecast error criterion.
Exponential smoothing was actually developed by Brown and Holt for forecasting the
demand for spare parts. The simple exponential smoother assumes no trend and no seasonality.
Other exponential smoothers implemented in STATA, include:
dexponential double exponential
hwinters Holt Model (w/ trend but no seasonality)
shwinters Winters Models (assumes a trend and a multiplicative
seasonality)
nl nonlinear filters
Smoothing techniques are also helpful in generating short-term forecasts.
Sometimes our goal in time series analysis is to look for patterns in smoothed plots. In
other instances, the rough part or residuals are of more interest.
gen rough= ts-tse
label var rough "Residuals from exp smoothing"
graph twoway line rough n
The last command above generates the time plot shown in Figure 4-46.
Introduction to Methods for Research in Official Statistics 144 of 179
-
4
0
0
0
0
-
2
0
0
0
0
0
2
0
0
0
0
4
0
0
0
0
6
0
0
0
0
R
e
s
i
d
u
a
l
s
f
r
o
m
e
x
p
s
m
o
o
t
h
i
n
g
0 100 200 300 400
n
Figure 4-46. Time plot of residuals.
Typically, the goal in time series analysis involves yielding forecasts. Smoothing
techniques can help yield short-term forecasts. A rather very simple forecasting method entails
using the growth rate:
percent x
value previous
value previous value current
100
) (
Consider the second half of the time series (since the residuals show a change in the variability
for the first and second half of the series, and thus, it may be best to only consider the second
half of the time series). Month-on-month growth rates on the ts variable can be readily obtained
using differences and lags:
gen growth_month=100*(D.ts/ L.ts) if n>220
Similarly, you can get year-on-year growth rates with the seasonal differences and the rates a
year before the current:
gen growth_year=100*(S12.ts)/( L12.ts) if n>220
A forecast based on the growth rates can be calculated from the following equation:
) 1 ( decimals in rate growth x value recent most FORECAST + =
so that you can then obtain forecasts both from the month-on-month and year-on-year growth
rates for the:
gen forecast1=ts*(1+(growth_month/100)) if n>220
gen forecast2=ts*(1+(growth_year/100)) if n>220
Introduction to Methods for Research in Official Statistics 145 of 179
We can alternatively use trend models that model time as an explanatory variable. You can run
a linear trend model with
reg ts n if n>220
predict forecast3 if n>220
To account for nonlinearity in the trend, you can try a quadratic trend model by entering
gen nsq=n^2 if n>220
reg ts n nsq if n>220
predict forecast4 if n>220
If you believe that there is a strong seasonal effect that can be accounted for by monthly
additive effects in addition to the quadratic effects, you can run the following:
gen mnth=mod(n,12)
tab mnth, gen(month)
reg ts n nsq month1-month12 if n>220
predict forecast5 if n>220
Assessment of the forecasts may be done by inspecting the within-sample mean absolute
percentage error (MAPE) of the forecasts:
gen pe1=100*abs(ts-forecast1)/ ts
gen pe2=100*abs(ts -forecast2)/ ts
gen pe3=100*abs(ts -forecast3)/ ts
gen pe4=100*abs(ts -forecast4)/ ts
gen pe5=100*abs(ts -forecast5)/ ts
mean pe1 pe2 pe3 pe4 pe5
which suggests that the forecasts using month-to-month growth rates giving the best
forecasting performance .
In addition to past values of a time series and past errors, we can also model the time
series using the current and past values of other time series, called input series. Several
different names are used to describe ARIMA models with input series. Transfer function
model, intervention model, interrupted time series model, regression model with ARMA errors,
and ARIMAX model are all different names for ARIMA models with input series. Here, we
consider a general structural model relating a vector of dependent variables y with covariates
X and disturbances with:
t
X
t t
y = +
Introduction to Methods for Research in Official Statistics 146 of 179
where we suppose that, the disturbances can itself be modeled in terms of an ARIMA process.
You can still use the arima command for this model.
How can you tell if it might be helpful to add a regressor to an ARIMA model? After
fitting an ARIMA model, you should save the residuals of the ARIMA model and then look at
their cross-correlations with other potential explanatory variables. If we have two time series,
we can also explore relationships between the series with the cross-correlogram command
xcorr.
When modeling, you must be guided by the principle of parsimony: that is, you ought
to prefer a simple over a complex model (all things being equal). We must realize also that the
goodness of fit often depends on the specification, i.e., what variables are used as explanatory
variables, what functional form is used for the variable being predicted, etc. Thus, what is
crucial is to work on an acceptable framework for analysis, before one attempts to use any
statistical models to explain and/or predict some variable.
Introduction to Methods for Research in Official Statistics 147 of 179
Chapter 5. Report Writing
5.1. Preparations for Drafting
It has been stressed throughout this training manual that it is important to plan the
research undertaking by coming up with the research proposal, even though there may actually
be a need to change some elements of the research plan as the research gets going. Drafting a
research report is certainly no different as it will entail planning, and revising of plans as a
researcher realizes something that causes him/her to rethink the research or the research results.
Therefore, it requires good skills in both structuring and phrasing the discoveries and thoughts.
These skills are acquired through experience, but can also be taught. Unlike a newspaper
article, a novel or an essay, a research report is a technically rigid text with a structure, but it
also carries some subjective intellectual style that reflects some personal opinions and beliefs
of the researcher. In coming up with the report, a draft (and perhaps several drafts) will have to
be written. The draft report is likely to be revised, perhaps even several times, before a final
version of the report ensues. Thus, the plan for the draft need not be extremely detailed.
The research report ought to explain concisely and clearly what was done, why it was
done, how it was done, what was discovered, and what conclusions can be drawn from the
research results. It should introduce the research topic and emphasize why the research is
important. The report should describe the data, as well as explain the research methods and
tools used; it ought to give enough detail for other researchers to repeat the research. Simple
examples ought to be given to explain complex methodologies that may have been used. The
clarity of the report should be on account of composition, reasoning, grammar and vocabulary.
Booth, et al. (1995), point out that writing is a way not to report what is in that pile of notes,
but to discover what you can make of it; they suggest that regardless of the complexity of the
task of coming up with the research report, the plan for a draft report should consist of (a) a
picture of who your readers are, including what they expect, what they know, what opinions
they have; (b) a sense of the character you would like to project to your readers; (c) the
research objectives (written in such a way as to also suggest gaps in the body of knowledge
Introduction to Methods for Research in Official Statistics 148 of 179
you are addressing) and the significance of the study; (d) your research hypotheses, claims,
points and sub-points; (e) the parts of the research paper. When should the report be started?
Some researchers prefer to come up with some major results before commencing the writing
process; others start it even immediately after the research proposal is finalized.
Starting the writing process can be quite frightening, even to most researchers. Laws of
physics suggest that when a body is at rest, it takes force to move the body. Similarly, it takes
some energy to begin writing up a research report, but once it starts there is also some inertia
that keeps the writing moving unless we get distracted. There are different ways of getting
started in writing and in continuing to write. Booth et al. (1995) suggest the importance of
developing a routine for writing: everyone has a certain time of day or days of the week when
they are at their most creative and productive states, thus you ought to schedule the creative
phase of your writing for these times. Other times may be more suitable for detailed work such
as checking spelling and details of the argument. Microsoft Word can also be used for spell
checking but beware that a correctly spelled word is not necessarily the right word to use.
Booth et al. (1995) also stress the need to avoid plagiarism, which they define as:
You plagiarize, when intentionally or not, you use someone elses words or
ideas but fail to credit that person. You plagiarize even when you do credit the
author but use his (or her) exact words without so indicating with quotation
marks or block indentation. You also plagiarize when you use words so close to
those in your source, that if you placed your work next to the source, you would
see that you could not have written what you did without the source at your
elbow. When accused of plagiarism, some writers claim I must have somehow
memorized the passage. When I wrote it, I certainly thought it was my own.
That excuse convinces very few.
Some researchers make great progress in writing up the report with the aid of a
structured technique. In this case, an outline is developed that identifies an overall structure of
the draft research report in a hierarchical manner to arrive at the specifics of the report. Some
people, however, find outlining a stifle to their creativity. An advantage to outlining is that the
pieces of the report are conceived of before the report is written, and this may help give focus
to drafting it.
The currently most popular word processing software package Microsoft Word
supports outlining. Merely click on View, then select Outline. The Outline View displays
the Word document in outline form. It uses heading styles to organize the outline; headings can
Introduction to Methods for Research in Official Statistics 149 of 179
be displayed without the text. If you move a heading, the accompanying text moves with it. In
outline view, you can look at the structure of the document as well as move, copy, and
reorganize texts in the document by dragging headings. You can also collapse a document to
see only the main headings, or expand it to see all headings and even the body text.
Typically, a research report would carry the following general topic outline:
1. Introduction
2. Review of Related Literature
3. Data & Methodology
4. Results & Discussion
5. Summary, Conclusions & Recommendations
Such a skeleton may help in the early stage of a researchers thinking process. These
topics could be further subdivided into sub-topics, and sub-sub-topics, and so forth. Elements
of the research proposal, such as the objective, hypothesis, conceptual and operational
framework, relevance of the research, may comprise the introduction section of the draft report,
together with some background on the study. That is, the first section on Introduction could
contain:
1.1. Background
1.2. Research Objectives
1.3. Research Hypotheses
1.4. Conceptual and Operational Framework
1.5. Significance of Study
The third section on Data & Methodology might include a description of the data and a
discussion of the data analysis method, including its appropriateness for arriving at a solution
to test the operational hypothesis, so the section could be divided as follows:
3.1. Introduction
3.2. Data
3.2.1. Sampling design
3.2.2. Data processing
3.2.3. Data limitations
3.3. Data Analysis Methods
Data can challenge the theory that guided their collection, but any analysis on poorly generated
data will not be valid, thus the manner in which data was collected and processed has to be
described. Statistical models help in distinguishing genuine patterns from random variation, but
various models and methods can be used. It is important to discuss the analytical methods
employed in order to see the extent of validity of research results.
Introduction to Methods for Research in Official Statistics 150 of 179
When the research will require more synthesis, interpretation and analysis, a researcher may
not necessarily have a clear sense of the results. The problem may not even be too clear, and in
this case the act of drafting will itself help in the analysis, interpretation and evaluation. There
may be a lot of moments of uncertainty and confusion. A review of the research questions and
identification of key points may help put a structure to the report. It may also be important to
combine a topic outline with a point-based outline that identifies points within each topic. Note
that there may be more than one way to arrange elements and points of the report, but
preferably go from short and simple to long and more complex, from more familiar to less
familiar. Consider the following example of an outline of a research report that analyses results
of a survey conducted by the Philippine National Statistics Office:
Research Topic/Title: Contraceptive Use, Discontinuation and Switching Behavior in the
Philippines: Results from the 2003 National Demographic and Health Survey
Researchers: Socorro Abejo, Elizabeth Go, Grace Cruz, Maria Paz Marquez
Outline
I. Introduction:
a. Increasing trends in contraceptive prevalence rates attributed to increasing use
of modern methods
b. last five years CPR unchanged (47% in 1998; 49% in 2003); one in four births
mistimed
c. some discontinuation of use of contraceptives due to method failure
II. Significance of Study: understanding contraceptive dynamics an important input
for family planning advocacy and information/education campaign
III. Objectives of Study:
a. To determine level and determinants of contraceptive method choice of women
by selected background variables
b. To determine twelve-month discontinuation rates and median duration of use
by specific methods and by selected background variables
c. To determine twelve-month switching rates by specific methods and by
selected background variables; and
d. To determine level and determinants of contraceptive use of men during the
last sexual activity by selected background variables.
IV. Conceptual Framework: largely from Bulatao (1989)
V. Data and Methodology
a. Data Source: NDHS, mention support of ORC macro; sampling design, only
some sections to be analyzed; calendar data file
b. Unit of analysis: currently married who were not pregnant at time of interview
or unsure about pregnancy
c. Background Variables: basic socio-econ and demographic characteristics
Introduction to Methods for Research in Official Statistics 151 of 179
d. Methodology: descriptive analysis (frequencies, cross-tabs); multinomial
logistic regression and multiple classification analysis; life table technique
VI. Results
a. Contraceptive Practices of Women:
i. trends in contraceptive use;
ii. differentials in contraceptive prevalence
iii. Determinants of Contraceptive Use
iv. Contraceptive Discontinuation
v. Contraceptive Switching Behavior
b. Contraceptive Practices of Men
i. Male Sexuality
ii. Knowledge, attitude toward and ever use of contraception
iii. Contraceptive prevalence during last sex
iv. Differentials in method choice
v. Differentials in type of method
vi. Multinomial logit regression of male contraceptive use
vii. Characteristics of male method users vs non users
VII. Conclusions and recommendations:
a. contraceptive practices among women associated with demographic factors;
use is higher among older women, among women who are in a legal union;
and among women who desire a smaller number of children; higher among
women with higher socio economic status
b. positive effect of education on contraception use;
c. need for special interventions to make FP supplies and services more
accessible to poor couples
d. degree of communication between woman and partner related to contraceptive
practices
e. poor attitude of contraceptive use among men
f. extra marital practices or ex-nuptial relationships exposes people to risk; need
for strengthening values
g. differentials in contraceptive use and withdrawals could help FP managers in
properly designing programs and strategies
h. some problems with information; services especially for education needs
improvement
Such an outline is helpful, as it goes beyond topics and provides thoughts and relations
among claims. In addition, here interrelations among the parts of the research report are
conceptualized, especially in coming up with points, claims and organization of the arguments.
Also, it is easy to observe whether issues need connectives that will bring together parts of the
text and reveal their relations.
After coming up with the outline, it is now time to go to the level of the paragraph. The
boundaries between a paragraph and a sub-section are not sharp. Sub-sections allow you to go
deeper into a topic, and require several related paragraphs. Within each sub-section, you may
Introduction to Methods for Research in Official Statistics 152 of 179
want to use the topic sentence to define the paragraphs. The idea is to write a single sentence
that introduces the paragraph, and leave the details of that paragraph for later. This expanded
form of the outline may constitute the skeleton of the initial draft of the report.
Outlining need not only be part of the pre-drafting stage. Sometimes, one comes up
with a long draft report based on a general outline, and then an outline above may help further
improve the draft by cutting out much of what was written (and even throwing away the entire
draft). There may even be several iterations of this drafting and outlining before the research
report is finalized. The point is to recognize that there will be changes in the outline, but it may
be much more satisfactory and productive to modify a well-considered outline than to start
without any plan whatsoever.
5.2. Creation of the Draft
After planning, outlining and organizing arguments into paragraphs, now comes the
writing of the draft report, which ought to be readable and coherent. The technical report is the
story of the research undertaking; it ought to be readable and coherent so that readers (and
other researchers) can determine what was done and how well it was done. Clear, concise,
direct, unequivocal and persuasive writing is important in scientific technical reports. A good
communicator must always understand that communication involves three components: the
communicator, the message and the audience. Thus a good criterion for selecting a writing
style is for the writer to think of the intended reader. The writer is challenged to keep it short
and simple (KISS). Conciseness should, however, not mean that clarity is sacrificed. Booth et
al. (1995) observe that:
Beginning researchers often have problems organizing a first draft because
they are learning how to write and at the same time, they are discovering what
to write. As a consequence, they often lose their way and grasp at any principle
of organization that seems safe.
Claims will have to be supported and communicated not merely textually, but even
with visual aids, such as tables, graphs, charts and diagrams. Visuals facilitate understanding of
the research and the results; they can convince readers of your point, as well as help you
discover patterns and relationships that you may have missed . Knowing what type of visual to
use in a report is crucial. You must keep the end result the research report and the key point
Introduction to Methods for Research in Official Statistics 153 of 179
of the research in focus. Depending on the nature of the data, some visuals might be more
appropriate than others in communicating a point. Be aware that a visual is not always the best
tool to communicate information; sometimes textual descriptions can provide a better
explanation to the readers. Visuals must also be labeled appropriately. The title of a figure is
put below the figure, while the title of a table is put above the table. These visuals must be
located as close as possible to the textual description, which must make proper references to the
visual. Readers ought to be told what you want them to see explicitly from a table or figure.
Whether the draft (and the final report) will contain visual or textual evidence (or both) will
depend on how readers can understand information and how you want your readers to respond
to the information you present. Beware that most software packages, especially spreadsheets,
may generate visuals that are aesthetically good but do not necessarily communicate good
information. For instance, pie charts, though favorites in non-technical print media, are
generally not recommended for use in research reports. Some psychological research into
perceptions of graphs (see, e.g., Cleveland and McGill, 1984) suggests that pie charts present
the weakest display among various graphs; they are quite crude in presenting information
especially for more than four segments. Thus, it is recommended that pie charts be avoided in
research reports although they may be used in presentations if you wish to have your audience
see a few rough comparisons where differences are rather distinctive. Booth et al. (1995)
suggests that a researcher ask the following questions when using visual aids:
How precise will your readers want the data to be? Tables are more precise
than charts and graphs (but may not carry the same impact)
What kind of rhetorical and visual impact do you want readers to feel? Charts
and graphs are more visually striking (than tables); they communicate ideas
with more clarity and efficiency. Charts suggest comparisons; graphs indicate
a story.
Do you want readers to see a point in the data? Tables encourage data
interpretation; charts and graphs make a point more directly.
Introduction to Methods for Research in Official Statistics 154 of 179
According to Booth et al (1995), some people write as fast as possible and correct later,
i.e., the quick and dirty way; others write carefully without leaving any problems, i.e., the
slow and clean way; and, others, or perhaps most people, may be somewhere in between
these extremes. In the latter case, it may be important to have notes of things that can be
checked on later. Booth et al. (1995) also provide five suggestions that a researcher may use in
drafting a report, viz:
Determine where to locate your point: express main claim especially at the last
sentence of introduction or in the conclusion. The same guideline ought to be
used for major sections and sub-sections of the report.
Formulate a working introduction: least useful working introduction announces
only a topic; it is important to provide a background/context, and if you can,
state the problem and even an idea of the solution. Most readers will not bother
proceeding to read the report if the introduction is not well stated.
Determine Necessary Background, Definitions, Conditions: decide what
readers must know, understand or believe; spell out the problem in more detail
by defining terms, reviewing literature, establishing warrants, identifying scope
and delimitations, locating the problem in a larger context. But this summary
MUST NOT dominate the paper.
Rework Your Outline: rearrange elements of the body of an argument to make
your points more organized: review what is familiar to readers, then move to
the unfamiliar; start with short, simple material before getting into long, more
complex material; start with uncontested issues to more contested issues. Note
that these suggestions may pull against each other.
Select and Shape Your Material: research is like gold mining: a lot of raw
material may be dug; a little is picked out and the rest is discarded. You know
you have constructed a convincing argument when you find yourself discarding
material that looks goodbut not as good as what you keep. Ensure that your
presentation of material highlights key points, such as patterns evident from
your data.
Introduction to Methods for Research in Official Statistics 155 of 179
The overall plan is to have a research paper with a flow in its structure involving its
various sections, viz., the introduction, data and methods section, results and discussions,
conclusions and recommendations. The section on results and discussions is the main portion
of the research report. It ought to contain answers to the research questions, as well as the
requisite support and defense of these answers with arguments and points. Conflicting results,
unexpected findings and discrepancies with other literature ought to be explained. The
importance of the results should be stated, as well as the limitations of the research
undertaking. Directions for further research may also be identified. A good research report
transfers original knowledge on a research topic; it is clear, coherent, focused, well argued and
uses language without ambiguities. The research report must have a well-defined structure and
function, so that other researchers can repeat the research.
In achieving this final form of the research report, you will first have to come up with
the draft. The first draft of the technical report need not necessarily use correct English or a
consistent style. These matters, including correct spelling, grammar and style can be worked
out in succeeding drafts, especially the final draft. One must merely start working on the draft,
and keep moving until the draft is completed. It may even be possible that what is written in a
draft may not end up in the final form of the report. Thus, it is important to start drafting as
soon as possible, and it is generally helpful not to be too much of a perfectionist on the first
draft. Booth et al. (1995) suggests that some details on the draft be left for revisions:
if you stumble over a sentence, mark it but keep moving. If paragraphs sound
disconnected, add a transition if one comes quickly or mark it for later. If points
dont seem to follow, note where you became aware of the problem and move on.
Unless you are a compulsive editor, do not bother getting every sentence perfect,
every word right. You may be making so many changes down the road that at this
stage, there is no point wasting time on small matters of style, unless, perhaps you are
revising as a way to help you think more clearly Once you have a clean copy with the
problems flagged, you have a revisable draft.
5.3. Diagnosing and Revising the Draft
The draft of the report will typically have to undergo several iterations of diagnoses and
revisions before the final form of the report is produced. After coming up with a first draft, it
may even be better to put it aside for a while, say a day, and then re-read the draft, which may
Introduction to Methods for Research in Official Statistics 156 of 179
enable you to more easily see mistakes in spelling, punctuation, and grammar. When using a
word processor, it may be difficult to sometimes catch mistakes; printing a hard copy of the
draft may be useful in revising the draft. Revisions involve some level of planning and
diagnosing. The process involves identifying the frame of the report: introduction and
conclusion (each of should carry a sentence that states the main claim and solution to the
problem); as well as the main sections of the body of the paper, including beginning and final
parts of each of these sections. In addition, it is important to analyze the continuity of the
themes in the paper, as well as the overall shape and structure of the paper. Repetitions of
words in the same paragraph, especially in consecutive sentences generally ought to be
avoided. Verbosity, i.e. wordiness, and use of redundant words ought also to be avoided.
Arguments have to be sound: it is one thing to establish correlation; it is another to establish
causality. Variables investigated may be driven by other variables that confound the effect of
the correlation between the investigated variables. Remove excessive detailed technical
information and the details of computer output and put them instead in an appendix.
It may be helpful to read and re-read your draft as if it were written by someone else.
The key is to put oneself into the shoes of a reader, i.e. to see the draft from and through the
eyes of a reader imagining how they will understand it, what they will object to, what they
need to know early so they can understand something later. (Booth et al., 1995). This will
entail diagnosing mistakes in organization, style, grammar, argumentation, etc., as well as
revising the text to make it more readable to a reader. It may help to ask yourself a few
questions, such as:
Is the introduction captivating (to a reader)?
Are the text structure, arguments, grammar and vocabulary used in the draft
clear?
Does the text communicate to its readers what you want it to? That is, can the
reader find what you wanted to say in the draft? Are the visuals, e.g., graphs,
charts, and diagrams communicating the story of the research effectively?
Does the draft read smoothly? Are there connectors among the various ideas
and paragraphs? Is there a flow to the thoughts? Are they coherent?
Is the draft as concise as possible? Are there redundant thoughts? Can the text
be shortened?
Introduction to Methods for Research in Official Statistics 157 of 179
Does the draft read like plain English?
Is there consistency between the introduction and conclusion?
Were the research objectives identified actually met?
Booth et al. (1995, p. 232-233) provide some concrete and quick suggestions for
coming up with revisions to the writing style in the draft. They are presented below in toto:
If you dont have time to scrutinize every sentence, start with passages where you
remember having the hardest time explaining your ideas. Whenever you struggle
with content, you are likely to tangle up your prose as well. With mature writers that
tangle usually reflects itself in a too complex, nominalized style.
FOR CLARITY:
Diagnose
1. Quickly underline the first five or six words in every sentence. Ignore short
introductory phrases such as At first, For the most part, etc.
2. Now run your eye down the page, looking only at the sequence of underlines to see
whether they pick out a consistent set of related words. The words that begin a
series of sentences need not be identical, but they should name people or concepts
that your readers will see are clearly related. If not you must revise.
Revise
1. Identify your main characters, real or conceptual. They will be the set of named
concepts that appear most often in a passage. Make them the subject of verbs.
2. Look for words ending in tion, -ment, -ence, etc. If they appear at the beginning
of sentences, turn them into verbs.
FOR EMPHASIS:
Diagnosis
1. Underline the last three or four words in every sentence.
2. In each sentence, identify the words that communicate the newest, the most
complex, the most rhetorically emphatic information; technical-sounding words
that you are using for the first time; or concepts that the next several sentences will
develop.
Revise
Revise your sentences so that those words come last.
Introduction to Methods for Research in Official Statistics 158 of 179
In looking through the draft, you ought to avoid colloquialisms and jargons; pay
attention to using proper, unambiguous words. Try also to use a dictionary/thesaurus,
especially if there is one in your word processor. This may help in remedying repetition of
words in a paragraph. Spell-checks also ought to be used, but be careful that these may not
always yield the correct words you want. When introducing technical and key terms in the
research report, it is preferred to structure the sentence so that the term appears in the last
words. The same may be true of a complex set of ideas, which ought to be made as readable as
possible.
Non-native writers of English have the tendency to write in their native language and
then translate their thoughts into English. This tends to be too much work unless only notes are
made rather than full sentences and texts. Even native speakers and writers of English are not
spared of problems in communication because of the rich grammatical constructions in
English. Grammar essentially involves linking, listing and nesting words together to
communicate ideas. Sentences consist of coordinated words, and clauses embedded and glued
together. Booth et al. (1995) suggests the importance of grammar in a research report: Readers
will judge your sentences to be clear and readable to the degree that you can make the subjects
of your verbs name the main characters in your story. To assist in the revision of the structure
and style of the draft research report, we provide in the Appendix of this manual a review of a
few grammar concepts and principles.
5.4. Writing the Introduction, Executive Summary and Conclusion
After a draft of the report has been written, a researcher is challenged to come up with
an introduction, conclusion and even an executive summary either while a revised draft is
being written or after the revised draft has been generated. The executive summary,
introduction, and conclusion ought to be clear, logical, coherent, and focused. The executive
summary should be coupled with good arguments and well-structured writing to enable the
paper to get widely read, and possibly published. The introduction and the conclusion likewise
ought to be well written as well as coherent in content. The key points stressed in the
introduction should not conflict with those in the conclusion. The introduction may promise
that a solution will be presented in the concluding section. Because of the importance of the
Introduction to Methods for Research in Official Statistics 159 of 179
introduction and conclusion, as well as the executive summary, some writers prefer to write
these last. Even in this case, a working introduction and working conclusion will still have to
be initially drafted.
5.4.1. The Introduction and Conclusion
After the title
4
Booth et al. (1995) emphasize that :
, the introduction will be read. Thus, the introduction ought to give
readers a sense of where they are being led to; otherwise, a reader may not proceed to reading
the entire report. The introduction ought to have a discussion from the broad topic towards
more specific and narrowly defined research problems and questions. It ought to state and
discuss the research objectives in clear language. The introduction should place the research
question in its scientific context, and state clearly what is new in this research and how it
relates to other researches, that is, the introduction should provide a sense of the studys
significance, and even an overview of related literature. The introduction uses the simple
present tense for referring to established knowledge or past tense for literature review.
Readers never begin with a blank slate They read with expectations; some they
bring with them, others you must create. The most important expectations you create
are in the research problem you pose. In your first few sentences, you must convince
your readers that you have discovered a research problem worth their consideration
and that you may even have found its solution. An introduction should never leave
them wondering, Why am I reading this?...
Common structure (of introductions) include at least these two elements, in this
predictable order:
a statement of the research problem, including something we do not
know or fully understand the consequences we experience if we leave
that gap in knowledge or understanding unresolved:
a statement of the response to that problem, either as the gist of its
solution or as a sentence or two that promises one to come.
And depending on how familiar readers are with the problem, they may also expect
to see before these two elements one more:
A sketch of a context of understanding that the problem challenges.
4
It is suggested that title contain only seven to ten words and avoid use of complex grammar. Preferably,
the title should be intrigue readers and attract interest and attention. Booth et al (1995) suggest that the
title be the last thing you should write a title can be more useful if it creates the right expectations,
deadly if it doesnt Your title should introduce your key concepts... If your point sentence is vague,
you are likely to end up with a vague title. If so, you will have failed twice: You will have offered
readers both a useful title and useless point sentences. But you will also have discovered something more
important: your paper needs more work
Introduction to Methods for Research in Official Statistics 160 of 179
They also provide an example of an introduction that states the context, the research problem
and the sense of the outcome:
First born middle-class native Caucasian males are said to earn more, stay employed
longer and report more job satisfaction.
5
But no studies have looked at recent immigrants to find out whether they repeat that
pattern. If it doesnt hold, we have to understand whether another does, why it is
different, and what its effects are, because only then can we understand patterns of
success and failure in ethnic communities.
6
The predicted connection between success and birth order seems to cut across ethnic
groups, particularly those from South-east Asia. But there are complications in
regard to different ethnic groups, how long a family has been here, and their
economic level before they came.
7
The following example is taken from an introductory section of a draft working paper
by Nimfa Ogena and Aurora Reolalas on Poverty, Fertility and Family Planning in the
Philippines: Effects and Counterfactual Simulations:
With the politicization of fertility in many countries of the world, the impact of
population research has expanded beyond mere demographics towards the wider
socio-cultural and political realm. The raging debates during the past year in
various parts of the country on the population and development nexus, at the macro
level, and why fertility continues to be high and its attendant consequences, at the
individual and household levels, created higher visibility for this ever important issue
and legitimized certain influential groups. Nevertheless, with the dearth of
demographic research in the country over the past decade, debates have very limited
findings and the empirical evidence to draw from to substantiate basic arguments
Correlations are not sufficient basis for arguing that fertility induces poverty or
poverty creates conditions for higher fertility. Better specified models are needed to
examine causal linkages between fertility and poverty. This study aims to :
(1) analyze regional trends, patterns and differentials in fertility and poverty;
(2) identify factors influencing Filipino womens poverty status through their recent
fertility and contraceptive protection and;
(3) illustrate changes in the expected fertility and poverty status of women when
policy-related variables are modified based on fitted structural equations models
(SEM).
Expected to be clarified in this study are many enduring questions such as:
Does having a child alter a womans household economic status?
How does the poverty situation at the local level impact on a womans
fertility and household economic status?
5
Shows Context.
6
States the Research Problem.
7
Indicates a sense of the outcome.
Introduction to Methods for Research in Official Statistics 161 of 179
As fertility declines, how much reduction in the poverty incidence is
expected?
If unmet need for family planning is fully addressed, by how much would
fertility fall?
Methodologically, this is the first time that the fertility-poverty status linkage is
examined at the individual level with a carefully specified recursive structural
equations model (SEM) that accounts for the required elements to infer causality in
the observed relationships. This paper also hopes to contribute to current policy
debates by providing different scenarios to illustrate possible shifts in womens
fertility and poverty status as selected policy-related variables are modified.
Notice that the authors make available the background of the research problem, the research
objectives and questions to be tackled, and the significance of the study.
The concluding section may offer a summary of the research findings. It ought to show
that the key objectives and research questions were solved. All conclusions stated ought to be
based on research findings; some suggestions for further research may be stated, recognizing
what is still not known. A closing quote or fact may be presented. Regardless of whether we
do so or not, the conclusion should be in sync with the introductory section.
Booth et al. (1995, p. 250-254) offers a couple of quick tips of dos and donts in the
choice and use of first and last words:
Your FIRST FEW WORDS:
Dont start with a dictionary entry: Webster defines ethics as
Dont start grandly: The most profound philosophers have for
centuries wrested with the important question of
Avoid This paper will examine I will compare Some published
papers begin like that, but most readers find it banal.
Three choices for your first sentence or two
Open with a Striking Fact or Quotation (but) only if its language leads
naturally into the language of the rest of your introduction.
Open with a relevant anecdote (but) only if is language or content
connects to your topic.
Open with a general statement followed by more specific ones until
you reach your problem.
Introduction to Methods for Research in Official Statistics 162 of 179
If you open with any of these devices, be sure to use language that lead
to your context, problem and the gist of your solution.
Your LAST FEW WORDS:
Not every research paper has a section titled Conclusion, but they all have a paragraph
or two to wrap them up.
Close with your main point especially if you ended your introduction not
with your main point but with a launching point. If you end your
introduction with your main point, restate it more fully in your
conclusion.
Close with a New Significance or Application (which) could earlier
have answered the question So what? but perhaps at a level more general
than you wanted to aim at. If your research is not motivated directly
by a practical problem in the world, you might ask now whether its
solution has any application to one.
Close with a call for more research if the
significance of your solution is especially interesting.
Close with a coda, a rhetorical gesture that adds nothing substantive to
your argument but rounds it off with a graceful close. A coda
can be an apt quotation, anecdote, or just a striking figure of
speech, similar to or even echoing your opening quotation or
anecdote one last way that introductions and conclusions
speak to each other. J ust as you opened with a kind of prelude,
so can you close with a coda. In short, you can structure your
conclusion as a mirror image of your introduction.
5.4.2. The Executive Summary
The executive summary of the report may be viewed as a miniature version of the
report. Typically the executive summary will be up to 500 words. (See examples in Chapter 3
of Executive Summaries from the First Research based Regional Training course). Many
people may have time only to look through an executive summary, and this part of the research
paper may even be the only part of the report that may be available to the general public. The
executive summary will help readers decide whether to examine any further into the research
paper as a whole. Thus, everything that is important in the research report ought to be reflected
and summed up in the executive summary. It ought to contain an introductory statement of the
rationale and research objectives, perhaps even the research hypotheses, a discussion of the
importance of the study, the data and methodology, the key themes and research results,
Introduction to Methods for Research in Official Statistics 163 of 179
conclusions and recommendations. The executive summary is an expanded version of an
abstract, which typically is not more than 100 to 150 words, containing the context, the
problem, and a statement of the main point, including the gist of the procedures and methods
used to achieve the research results.
Introduction to Methods for Research in Official Statistics 164 of 179
Chapter 6. Reporting Research Results
6.1. Disseminating Research Results
Often there is a need to report the research results other than in the form of a research
report. Research reports have to be widely distributed for them to be read by a big readership.
Orally presenting the research results in a public forum, e.g., a conference or a seminar, is
probably the most common and most rapid way to disseminate new findings. Current practices
for public speaking in many scientific conferences, however, are relatively low, perhaps
because many scientists, official statisticians, included, are not trained in the art of marketing
their products and services. A good presentation can undoubtedly be memorable to the
audience. Thus, it is important to have a sense of some basic principles on presentation
techniques that can effectively transmit the information about the research and the research
results. A number of fairly new literature, e.g. Alley (2003), provide some discussions of
effective presentation techniques especially in scientific talks. What should be stressed is that
you must do justice to your topic and your work. If the research was worth doing, then its
presentation and dissemination must be worth doing well.
Certainly, it can be noted that some people are very effective public speakers (and can
probably sell any idea). However, effective presentation involves talents and competencies that
can be developed. Some hard work and practice can help us in yielding good, if not very good,
presentation skills.
The reality is that not all people who would attend a dissemination forum of your
research will read your report. Thus, your work and your research report will likely be judged
by the quality of your presentation. The primary purpose of a presentation is to educate, to
provide information to the audience. If your presentation is poorly expressed, then your
research results will be poorly understood. In this case, your research will be in danger of being
ignored, and your effort working on your research may be put to waste. Too many details will
likely not be remembered by the audience, who might just go to sleep, and worse, snore during
your presentation!
Introduction to Methods for Research in Official Statistics 165 of 179
6.2. Preparing for the Presentation
In writing the research report, it is important to take the stand of your reader, and to
plan what you write. Similarly, during a presentation, you ought to be considerate of your
audience, and realistically plan your talk. You ought to know your audience, identify areas of
common interest with them, and anticipate potential questions that they may raise during the
presentation. Neither underestimate nor overestimate your audience. However you need not
plan your presentation in the same order as you carried out your research paper. While you
may have used a logical plan for your research report, your presentation will involve not only
an overall plan but a didactic plan, that involves understanding the specific type of audience
you have, the type of talk you are to give and the duration of your talk.
Unlike the written research report, a presentation is a one-shot attempt to make a point
or a few points. Thus, it is vital that your presentation be well-constructed and organized, with
your points submitted in a logical, clear sequence. Often, your presentation may have to be
good only for ten to fifteen minutes. The shorter the talk, the more difficult it will generally be
to cover all the matters you wish to talk about. You cant discuss everything, only some
highlights, thus you need to be rather strict about including only essential points for the
presentation, and removing all non-essential ones. Try to come up with a simple presentation.
Of course, planning the content of a talk in relation to the actual length of the talk can be rather
difficult. It may be helpful to note that if you use about 100 words per minute, and each
sentence covers about 15 words, and each point will carry about 4 sentences then a 15 minute
talk will roughly entail 25 concepts (and 25 slides, at 1 concept per slide). If you are given
more than 30 minutes for your presentation, while you have more time to cover material, you
have to find ways of enlivening your presentation as peoples attentions are typically short-
lived. Also, in this case, there is danger that your audience may be more attentive to your
assumptions and question them. Whether your talk will be short or long, as a rule of thumb,
people can only remember about five or fewer new things from a talk.
Introduction to Methods for Research in Official Statistics 166 of 179
It is suggested that before your talk, you should create an outline of your talk as well as
plan how to say what you want to say. It should be clear to you whether you are expected to
present new concepts to your audience, or build upon their knowledge. That is, have people not
read your paper ( in which case, how can you persuade them to do so) or have they read it ( in
which case, what specific point do you wish them to appreciate). Either way, the basics of your
talk the topic, concepts and key results ought to be delivered clearly, and early on during
the talk, to avoid losing the attention of the audience. You ought to identify the key concepts
and points you plan to make. Determine which of these concepts and points will require
visuals, e.g., graphs, charts, tables, diagrams and photos that will convey consistency with your
message. Preferably, these visuals should be big, simple and clear. The intention of using such
visuals is to provide insights and promote discussions. Although presenting data in tables can
be effective, these tables should not be big tables. Only small tables with data pertinent to the
point you want to deliver should be used. Figures may often be more effective during
presentations. All preparations for such visuals ought to be done way in advance.
If you will be presenting your report with the aid of a computer-based presentation,
such as Powerpoint, make sure to invest some time learning how to use this software.
Powerpoint is an excellent tool for organizing your presentation with slides and transparencies.
Inspect the various templates, slide transitions and custom animation available. While colorful
templates may have their use, you ought to be prudent in your choice of templates,
backgrounds and colors. Artwork, animation and slide transitions may improve the
presentation, but dont go overboard. Too much animation and transitions may reduce the
visuals effectiveness and distract your audience from your message. Using cartoons
excessively may make your presentation appear rather superficial. Thus, make sure to be
artistic only in moderation. Try not to avoid losing sight that you have an overall message to
deliver to your audience the key point of your research. You are selling your research!
Artwork will not substitute for content. The earlier you start on your slides, the better they will
be, especially as you fine tune them. However, avoid also excessive fine tuning.
Be conscious that your slides are supports and guides for your talk. You also should
cooperate with the content of your slides. Your first or first two slides should give an overview
Introduction to Methods for Research in Official Statistics 167 of 179
(or outline) of the talk. Your slides ought to be simple and informative; with only one basic
point in each slide. If you intend to give a series of points, it may be helpful to organize them
from the most to the least important. This way, your audience is more likely to remember the
important points later. Establish logical transitions from one point to another that link these
issues. You may, for instance, pose a question to introduce the next point.
Make sure to put the title, location, and date of the talk on each slide; run a spelling
check on the words in your slides. It can be embarrassing if people pay more attention to
mistakes in spelling in the slides than to the issues you are attempting to communicate to your
audience.
Some standard donts in using Power-point presentations:
Dont use small fonts. Make sure that your slides are readable. As a rule of
thumb, if it looks just about right on the computer screen, then it is likely
to be too small during the presentation. If it looks big, it may still be small
during the slide view. You should intend to see outrageously large fonts (and
that goes for figures, charts, etc). To do a simulation, put your slides on slide
show, and step back six feet away from your computer. You ought to be
able to read all the text very easily in your presentation; if not you ought to
resize your fonts. You shouldnt write too much on each slide as the fonts get
automatically smaller if there is too much written. Try having only four or
five lines per slide with not more than six words per line. Preferably have the
size of the smallest font used be around 36 point.
Dont use pale colors, e.g. yellow (about one in ten people are said to be color
blind, these people cannot see yellow). Use highly saturated colors instead of
pastels, and complementary colors for your text and background to increase
the visibility of the text in the slide. Color increases visual impact.
Dont use only upper case (except for the presentation title). A mixture of
lower and upper cases tends to be easier to read than purely upper or purely
lower case.
Introduction to Methods for Research in Official Statistics 168 of 179
Dont use complete sentences. Better to only have key phrases, and use bullets.
If you use sentences, use only short sentences with simple constructions.
Dont overcrowd slides, e.g., do not use very large tables copied directly from
an MS word document; dont put too many figures or charts on one slide.
Dont change formats. Once you have made your choice on colors, fonts, font
size, etc., you ought to stick to them. Consistency in format will allow your
audience not to be distracted by anything other than your message.
Dont display slides too quickly. But dont spend too much time on one slide
either. The audience can scan the content of a slide (including visuals) within
the first three seconds after it appears. If you don't say anything during this
period, you allow your audience to absorb the information, then, when you
have their attention, you can expand upon what the slide has to say.
Dont ever read off everything directly from your slides. The audience can
read! If you were to read off directly everything from your slides, then
people could just be given copies of the slides without you needing to give
the presentation. You need to say some words not stated in your slides, e.g.,
elaborations of your points, anecdotes, and the like. You ought to maintain
eye contact with your audience, preferably almost always.
Finally, stage your presentation run through your talk at least once. A practice talk is
likely to be about 25% faster than the actual presentation. Re-think the sequencing of issues
and reorganize it to make the talk run more evenly. Rephrase your statements as needed. Delete
words, phrases, statements and issues that may be considered non-essential bearing in mind
time constraints as well as flow of ideas in your talk. Try to target having some time left over
for questions at the end of the talk. You may want to prepare a script or notes for your
presentation to put your slides at sync with what you want to say. The script or notes can be
quite useful, especially if you go astray during the presentation. It puts order to what you will
say, but this has to be organized well.
Practice your presentation, perhaps first in private. Listen to the words you use as well
as how you say them, and not to what you may think you are saying. Rehearsing your talk in
Introduction to Methods for Research in Official Statistics 169 of 179
front of a mirror (making eye contact with an imaginary audience) can be a painful and
humbling experience, but also rather helpful as you observe your idiosyncrasies and
mannerisms, some of which may be distracting to your audience. After your private rehearsal,
you can try your presentation out in front of some colleagues, preferably some of whom should
not know too much about your topic and research. These colleagues can provide constructive
feedback, both on the content and style of presentation. Make the changes. Rehearse some
more (as the saying goes practice makes perfect) perhaps around five times, then let it rest
(and even sleep on it). The night before the talk, sleep well.
6.3. The Presentation Proper
J ust right before the talk, try to relax and take it easy: dress comfortably and take a deep
breath before you start. Arrive at the venue at least ten minutes before your talk. Scan the
environment for possible problems in line of sight by the audience. Try to be familiar with the
equipment. Don't wait until the very few minutes before your presentation before you run to the
toilet; check your appearance - including zippers, buttons and the like - before you reappear. It
can be rather embarrassing if peoples attentions are directed not at your talk but at open
zippers, buttons, etc. If you prepared Powerpoint slides, make sure to have several soft copies
and test-run a copy before your presentation. Check if the colors, fonts, bullets, formulas, etc.
are the same as what you prepared.
During the presentation, straighten up and face the audience. Be prepared for your
presentation. Express confidence: smile, speak slowly, clearly and loudly (and use gestures).
Be balanced in your tone of voice. If a microphone is available, do not say does everybody
hear me?; do not tap on the microphone; do not say hello, hello, mike test or one-two-
three on the microphone. J ust speak up and begin your talk. If the microphone does not work,
it may be off. If there is a problem beyond the switch, just start your talk. Be as articulate as
possible. You may want to mention that more details are available in the research report.
Introduce your ending by making the title of your last slide or set of few slides
Conclusion. Here, you may want to offer a summary of the background (and thus the
significance of your research), as well as a recapitulation of the findings. Discuss directions for
Introduction to Methods for Research in Official Statistics 170 of 179
future work. Then, finally end your talk with some few words of thanks. An acknowledgement
slide may be helpful either at the very end or at the front end of the talk.
If you are interrupted during a conference or seminar, you can answer without delay but
dont lose control over the flow of your presentation. You ought to avoid being sidetracked or
even worse, being taken over by the distraction. You can opt to defer your reaction to the
interruption by mentioning that this will be discussed later or at the end of the talk.
If there are questions raised during your presentation, it is good practice to repeat the
question not only for your benefit and that of the person raising the question, but also for the
sake of others. Take some time to reflect on the question. If there are questions about the
assumptions in your research, you will have to answer in detail the questions. Thus be
prepared and anticipate the audience reactions. If there are questions about a point you made,
you will have to discuss this point and re-express it clearly. If the audience is largely composed
of non-specialists, you may want to delay your discussion until the end of your presentation or
have a private discussion. Technical questions will have to be provided technical answers. If
you dont understand the question raised or if the question is rather challenging, you may
honestly say so (but no need to apologize). Here, you may want to offer to do further research
on the issue, ask suggestions from the floor or have further discussions after your presentation.
If you sense that your answer and discussion is getting prolonged, make efforts to get out of the
heat tactfully.
If you run out of time in your presentation, you may finish your current point, refer the
interested reader to further details in the research paper and jump to the concluding section of
your presentation.
Introduction to Methods for Research in Official Statistics 171 of 179
Appendix Some Grammar Concepts
I. Tense of a Verb: is the time to which the verb refers. This ought to present few difficulties if
you place yourself in the position of the reader and consider the time to which the statement
refers at the time it is written. Within a paragraph, changes in time frame are indicated by
changing tense relative to that primary tense, which is usually either simple past or simple
present. As a general guideline, however, writers are advised not to shift from one tense to
another if the time frame for each action is the same.
Future: used for events in the future when the text was written. It is often used
in the Methodology section of the research proposal
The study will utilize information on contraception gathered from
some 14,000 female respondents aged 15-49 and some 5,000 male
respondents aged 15-54, sourced from the 2003 National
Demographic and Health Survey (NDHS).
or for speculative statements in the Summary and Conclusions section of the
research report.
Enhancing mens knowledge about sterilization methods will increase
acceptance of these methods.
Past: used for events already in the past when the text was written.
A calendar data file was created by extracting calendar information
from the 2003 women data file using the Dynpak software package
of programs
It is often used in the Methodology section of the research report. It is also
used for a result that is specific to the research undertaking. (If the statement is
a statement of fact, in general, the present tense is used).
Present: used for statements that are always true, according to the researcher, for
some continuing time period; the statement may have been false before, but it
is true at the time of writing and for sometime henceforth.
Introduction to Methods for Research in Official Statistics 172 of 179
The 2003 NDHS is the first ever National Demographic Survey which
included male respondents.
It is also used for a result that is widely-applicable, not just to the research.
Better educated, working women are more likely to have a higher
level of contraceptive use.
Past Perfect: used for events already in the past when another event in the
past occurred.
Future Perfect: used for future events that will have been completed when
another event in the future occurs.
II. Voice of a Verb: refers to whether the subject of the sentence performs the action expressed
in the verb (the so-called active voice)
A researcher presented his research proposal. or whether the
subject is acted upon (the so-called passive voice), i.e., the agent
of action appears in the phrase by the:
Example: The research proposal was presented by the researcher.
In the passive voice, sometimes there is no explicit subject (no by the phrase), only an
implied one in the complement. Note that in the last example, it is not actually necessary to
have the phrase by the researcher to make the sentence a complete sentence. Writers are
advised to avoid dangling modifiers caused by the use of passive voice. A dangling modifier is
a word or phrase that modifies a word not clearly stated in the sentence. Instead of writing To
cut down on costs, the survey consisted of 1000 respondents one should write To cut down
on costs, the team sampled 1000 respondents In the first case, the construction of the sentence
indicated that the survey was the one that decided to cut down costs.
There is wide acceptance for the use of the active voice in non-scientific writing over the
passive voice as the active voice yields clearer and direct thoughts. The passive voice ought to
be used, however, when the object is much more important than the subject, especially when
the identity of the subject does not matter.
It is widely regarded to use the active voice.
Introduction to Methods for Research in Official Statistics 173 of 179
Here it is more rhetorically effective to use an indirect expression. The passive voice
may also help avoid calling attention to oneself, i.e., rather than write:
I selected a sample of 1000 respondents.
You may choose to write
A sample of 1000 respondents was selected.
Another alternative way of writing the sentence is to use a third-person style, i.e. use
the researcher in place of I:
The researcher selected a sample of 1000 respondents.
III. Punctuation: refers to breaking words into groups of thoughts, as signals to
readers that another set of thoughts is to be introduced. These punctuation marks make
it is easier for the reader to understand the flow of thoughts of a writer, to emphasize
and clarify what a writer means. The rules for the use of a number of punctuation
marks are fairly standard although they are not static, i.e., they have changed through
the years. These conventions and rules are created and maintained by writers to help
make their text more effective in communicating ideas.
The period (.) completes a sentence; while the semi-colon (,) joins two complete
sentences where the second sentence is rather related to the first. An exclamation
point (!) also ends a sentence, but, unlike the period, it indicates surprise or emphasis.
A question mark (?) also ends a sentence, but it proposes a question which could be
answered, either by the reader or the researcher/writer. (Note that rhetorical questions
need no answer.)
The colon (:) is placed at the end of a complete sentence to introduces a word, a
phrase, a sentence, a quotation, or a list.
The National Statistics Office has only one major objective: to
generate statistics that the public trusts.
The most common way of using a colon, however, is to introduce a list of items.
Introduction to Methods for Research in Official Statistics 174 of 179
The regression model involved explaining per capita income of the
family with a number of characteristics of the family: family size,
number of working members in the household, years of education of
household head, and amenities.
Note that the colon should not be placed after the verb in a sentence, even when you
are introducing something, because the verb itself introduces the list and thus, the
colon would be redundant. If you are not sure whether you need a colon in a particular
sentence, try reading the sentence, and when you reach the colon, substitute the word
namely. If the sentence reads well, then you may need the colon. (Of course, there
are no guarantees!).
The comma (,) can be used more variedly: it can separate elements of a list, or it can
join an introductory clause to the main part of a sentence. Some writers can tell where
a comma is needed by merely reading their text aloud and inserting a comma where
there is a need for a clear pause in the sentence. When a reader encounters a comma,
the comma tells the reader to pause (as in this sentence). In the preceding sentence, the
comma is used to join the introductory clause of the sentence (When a reader
encounters a comma) to the major part of the sentence (the comma tells the reader to
pause). Another reason for the pause could be that the words form a list, and the reader
must understand that the items in the list are separate. What is often unclear is whether
to include the comma between the last and second-to-last items in a list. In the past, it
was not considered proper to omit the final comma in a series, modern writers believe
that conjunctions such as and, but, and or do the same thing as a comma and
these writers argue that a sentence is more economical without the comma. Thus, you
actually have the option to choose whether or not to include the final comma. Many
writers, however, still follow the old rule and expect to see the final comma. Note also
that while we can use a semicolon to connect two sentences, more often we glue two
sentences together with a comma and conjunction.
Introduction to Methods for Research in Official Statistics 175 of 179
A regression model was run on the data, and then model diagnostics tools
were implemented.
If your sentence is rather short (perhaps between five and ten words), you may opt to
omit the comma.
The comma may also be used to attaching one or more words to the front or back of the
core sentence or when you insert a group of words into the middle of a sentence. If the
group of words can be viewed as non-essential, the comma will have to be put on both
sides as in the following example.
The poverty data, sourced from the 2003 Family Income and Expenditure
Survey, suggests a fall in the percentage of poor people (compared to the
previous survey conducted three years ago).
For more grammar bits and tips, you may want to look over various sources from the internet,
such as http://owl.english.purdue.edu/handouts/grammar/
Introduction to Methods for Research in Official Statistics 176 of 179
List of References
Albert, J ose Ramon G. (2008). Statistical Analysis with STATA Training Manual. Makati City:
Philippine Institute for Development Studies.
Alley, Michael. (2003).The Craft of Scientific Presentations. New York: Springer-Verlag.
Asis, Maruja M.B. (2002). Formulating the Research Problem, Reference Material in the University of
the Philippines Summer Workshops on Research Proposal Writing.
Babbie, Earl (1992). The Practice of Social Research ( 6
th
ed.), California: Wadswoth Publishing Co.
Booth, Wayne C., Gregory G. Colomb, and J oseph M. William. (1995) The Craft of Research. Illinois:
University of Chicago.
Cleveland, W. S. and R. McGill. (1984), Graphical Perception: Theory, Experimentation and
Application to the Development of Graphical Methods, J ournal of the American Statistical Association,
79, 531-554.
Draper, Norman and Harry Smith (1998) Applied Regression Analysis, New York: J ohn Wiley.
Gall, Merdith, D., Walter R. Borg, J oyce P. Gall (2003) Educational Research: An Introduction, 7
th
edition. New York: Pearson.
http://owl.english.purdue.edu/handouts/grammar/
Haughton, J onathan, Shahidur R. Khandker (2009). Handbook on Poverty and Inequality. World Bank:
Washington
Hoff, Darrel (1954). How to Lie with Statistics. New York: W.W. Norton
Hosmer, David, W., J r., and Stanley Lemeshow. (2000). Applied Logistic Regression, Second Edition.
New York: J ohn Wiley & Sons.
Kuhn, Thomas S. (1970). The Structure of Scientific Revolutions, Second Edition. Chicago: University
of Chicago Press.
Meliton, J uanico B.(2002). Reviewing Related Literature, Reference Material, University of the
Philippines Summer Workshops on Research Proposal Writing.
Mercado, Cesar M.B. (2002). Overview of the Research Process, Reference Material, University of the
Philippines Summer Workshops on Research Proposal Writing.
Mercado, Cesar M.B. (2002). Proposal and Report Writing, Reference Material, University of the
Philippines Summer Workshops on Research Proposal Writing.
Nachmias, Chava F. and David Nachmias (1996). Research Methods in the Social Sciences (5th ed.),
New York: St. Martins Press.
Phillips, Estelle M. and Derek S. Pugh (2000). How to Get a Ph.D.: A Handbook for Students and Their
Supervisors (3rd ed), Philadelphia: Open University Press.
Stevens, S. (Ed.) (1951) Handbook of Experimental Psychology. NY: Wiley.
Statistical Research and Training Center (SRTC) Training Manual on Research Methods.
University of the Philippines Statistical Center Research Foundation Training Manual on Statistical
Methods for the Social Sciences.
World Bank (2000) Integrating Quantitative and Qualitative Research in Development Projects. Michael
Bamberger (ed).