You are on page 1of 7

2007 IEEE/WIC/ACM International Conference on Web Intelligence

Automatic Website Comprehensibility Evaluation

Ping Yan Zhu Zhang Ray Garcia


University of Arizona University of Arizona University of Arizona
Tucson, AZ, 85721 Tucson, AZ, 85721 Tucson, AZ, 85721
pyan@email.arizona.edu zhuzhang@email.arizona.edu rgz@email.arizona.edu

Abstract tion of learning about the topic instead of effort to find


content to use for learning.
The Web provides easy access to a vast amount of Popular meta-search engines, such as Google, use
informational content to the average person, who may PageRank algorithms to sort and filter search results,
often be interested in selecting websites that best match the ranking is based on link analysis on the importance
their learning objectives and comprehensibility level. of the contents within a set [1]. Similarly, algorithm
Web content is generally not tagged for easy determi- HITS (Hypertext Induced Topic Selection) measures
nation of its instructional appropriateness and com- WKHFRQWHQWVDXWKRULW\E\FRQVLGHULQJLWV³KXEQHVV´[2].
prehensibility level. Our research develops an analyti- PageRank and HITS are based on linkage of the docu-
cal model, using a group of website features, to auto- ments. Directories, such as DMOZ (open directory
matically determine the comprehensibility level of a project, http://dmoz.org/) and derivatives, provide an
website. These features, selected from a large pool of organic taxonomy of websites, but lack the indicators
website features quantitatively measured, are statisti- for instructional value and comprehensibility. Virtual
cally shown to be significantly correlated to website libraries have attempted to create directories of sources
comprehensibility based on empirical studies. The but they are not inclusive of the vast resources availa-
automatically inferred comprehensibility index may be ble on the public Internet and also generally do not
used to assist the average person, interested in using adequately address whether content is instructional or
web content for self-directed learning, to find content comprehensible. Self-directed learners, using the Web
suited to their comprehension level and filter out con- for learning, need some form of guidance in narrowing
tent which may have low potential instructional value. the choices and focusing their efforts to the most ap-
propriate instructional content they can find.
To address the needs of the self-directed learners
1. Introduction using the Internet for learning, we automated the
screening process of Web content by computing the
One of the challenges of using web content for comprehensibility score of each website of interest.
learning is finding content which is instructional within We use the term comprehensibility here to define the
the comprehension level of the user. For example, a degree to which a web page provides direct access to
common strategy for finding web content, when at- the relevant information (i.e., substance in the hyper-
tempting to learn about a particular topic, is to start by text space) with minimal distractions. In this study, we
going to a web search engine and typing in a keyword, measure website characteristics regarding Information
XVLQJ WKH NH\ZRUG ³&KDUOHV 'DUZLQ´ UHWXUQV RYHU  Value, Information Credibility, Media Instructional
Million results, without any indication as to which of Value, Affective Attention, Organization and Usability.
these results are of instructional value relative to ones We then search for predictors among this set of fea-
comprehension level. The user may review each search tures to determine the comprehensibility level of a
result and guess as to which may satisfy their learning website.
objectives, then review the website content and attempt From July 2006 through October 2006, we collected
to make the determination of comprehensibility and a data corpus consisting of 800 websites that are ma-
instructional value using their own judgment, this is nually browsed and evaluated by four professional
largely inefficient and distracts from the primary inten-

0-7695-3026-5/07 $25.00 © 2007 IEEE 191


DOI 10.1109/WI.2007.59
librarians. An empirical study based on the data corpus ZHESDJH¶VFRQIRUPDQFHZLWKDVHWRIXVDELOLW\DFFHs-
is used to understand the characteristics of a web page sibility, or other guidelines. Researchers on the Web-
that may be more suited for learning from a compre- Tango project [7] considered a number of quantitative
hensibility perspective. We construct the feature space measures of a website such as informational, naviga-
of websites by aggregating the page-level feature vec- tional, and graphical aspects to rank its quality as poor,
tors. The aggregation considers the website topology average, or good. They use such quality metrics to help
and WKH XVHU¶V browsing behavior. A computational non-professional designers improve their web site de-
model is created based on statistical analysis of the sign.
data corpus. We thus predict the comprehensibility Readability indexes have been used in education for
score for a website by examining the website characte- many years. Several readability index calculations in-
ristics. clude Readability Test, Gunning-Fog Index, Flesch-
The remainder of this paper is structured as follows. Kincaid Readability Test, SMOG Index, Coleman-Liau
Section 2 presents an overview of related research. In Index. They usually measure vocabulary difficulty and
Section 3 we discuss the features associated with both sentence length to predict the difficulty level of a text,
manual and automated comprehensibility evaluation. often suggesting an approximate reading age of the text.
Section 4 summarizes our empirical study. In Section 5 A full discussion can be found in [8].
we describe our modeling techniques and present the Synthesizing the existing work discussed above, we
findings. Section 6 concludes by reviewing our find- see that the assessment of web comprehensibility for
ings and discussing the implications. the purpose of facilitating learning processes has not
been widely investigated. In particular, few compre-
2. Related works hensibility research studies the comprehensibility from
an analytical perspective, and thus are hardly feasible
Website evaluation researches and studies have ac- to be automated. A recent study [9] related reading
tively focused on assessing website usability, accessi- strategies with hypertext based learning and cognition.
bility, credibility and overall design issues. Design Díaz Sicilia et al. [10] proposed an evaluation frame-
guidelines and evaluation tools either from website work made up of a number of criteria to test the utility
GHYHORSHUV¶ RU ZHEVLWH XVHUV¶ SHUVSHFWLYHV H[LVW LQ WKH and usability of educational hypermedia systems. They
literature. Fogg et al. investigated how different ele- emphasized the importance of evaluating the efficiency
PHQWV RI ZHEVLWHV DIIHFW SHRSOH¶V SHUFHSWLRQ RI FUHGi- of educational hypermedia systems, so that they can
bility by using an online questionnaire [3]. WebSAT meet the users learning and teaching needs, and framed
[4] is a static analyzer tool that inspects HTML files the evaluation approach around a number of analytical
for potential usability problems. WebSAT identifies parameters such as readability, aesthetic or consistency,
problems related to e.g. readability, maintainability as well as content richness, completeness and hypertext
according to its own set of usability rules or IEEE structure. In general, they provide a theoretical frame-
Standards. A free online service, WebXACT work but without an evaluation. Our proposed work
(http://webxact.watchfire.com/), is another analysis uses analytical approaches to automatically determin-
tool evaluating the usability of an individual web page ing the comprehensibility score for each website in
for quality, accessibility, and privacy issues. Bobby support of learning. The results are empirically eva-
World Wide Web Accessibility Tool is mainly used to luated using expert judgment.
inspect the compliance of website design with the Web The most closely related work is our earlier study
accessibility guidelines such as World Wide Web Con- [11] in which we reported a preliminary analysis of a
sortium's (W3C) Web Access Initiative (WAI) collection of 300 websites. We now report a much
(http://www.w3.org/WAI/). These studies provide val- expanded study than in the previous report in the fol-
uable insight about accessibility issues, but they do not lowing ways: First, the analytical model is developed
form a strong indicator for website comprehensibility by inspecting and selecting the optimal feature sets
quality, as our study suggested. Brajnik [5] surveyed among 191 features rather than 19 features investigated
11 automated website analysis methods, but revealed by the previous study to predict the website compre-
that these tools address only part of usability issues, hensibility. It nearly measures all aspects of a website
such as download time and validation of HTML syntax. that can be determined by parsing and analyzing the
Evaluation of hypertext comprehensibility related is- content features. Features correlated to website com-
sues such as information consistency and information prehensibility are examined in finer granularity and
organization is absent from the literature. more broadly. Second, in-depth analysis and experi-
Methods surveyed by Ivory and Hearst [6] entail ments, that test the model's predictability for different
analyzing server and other log file data or examining a settings, are conducted. A non-linear modeling tech-
nique based on Support Vector Regression (SVR) is

192
applied in an attempt to better predict the website's tures are extracted from the HTML pages regarding the
comprehensibility level. Finally, we validate our anal- graphical contents such as the counts of images, counts
ysis based on a larger data set containing 800 website of animated artworks and number of audio or video
data points. clips in the web page, or maximum height and width of
the images.
3. Comprehensibility implications While the three measures above focus on informa-
tion contents of web pages, another three measures are
The main interest in comprehensibility is the extrac- used to examine the internet construction values of the
tion of the instructional value of the information con- websites.
tent in the web pages and the related linked web pages. x Comprehensibility is also enhanced by the
The instructional value is a measure of the knowledge ease of focusing attention on the most relevant infor-
contained within the page, including the degree of ap- mation of a website. The consistency and uniformity of
plication, analysis, synthesis, inquiry or any other narr- the presentation of the pages on the website should add
ative that provokes learning activity. to comprehensibility, while the arbitrary use of colors,
Comprehensibility is the degree to which a group of fonts, background, images distracts from the ease of
web pages provide direct access to the substance of the reading the text. Affective Attention (AA) rating is
information in the hypertext space without distractions. determined by evaluating the format, appearances and
A hypertext space is the web page being viewed plus aesthesis. 35 features regarding text formatting and
all related linked pages which are necessary for the page formatting such as number of words that are
reader to understand the information within the related bolded, italic or capitalized and the presence of style
web pages. The reader should easily be able to deter- sheets are examined.
mine where they are in the hypertext space relative to x Comprehensibility is evidenced by web pages
where they started. We conceive that the Web compre- that allow for ease of linking through and selectively
hensibility is dependent on the following aspects of the discovering the meaning relevant to the reader. The use
websites: Information Value, Information Credibility, of short paragraphs, bullet points, tables, or other
Media Instructional Value, Affective Attention, Organ- summary presentation allowing for quick scanning of
ization and Usability. the information to find the central ideas should also
x Comprehensibility encompasses ease of find- add to comprehensibility. Organization Structure (SO)
ing and understanding of the concepts presented as- indicates the effectiveness of navigation (uses of list,
suming that the reading level of the text is equal or less tables, headings, and links) and website contents and
than the reading level of the evaluator. Information layout design consistency. Related features are maxi-
Value (IV) checks the readability and information mum crawling depth for a site, the number of hyper-
richness and completeness in general. We compute 38 links in a page, the counts of page files in different
features such as the number of words in the title, in types (PHP pages, ASP pages and TXT documents,
meta contents, in the body text; and a number of reada- and etc.) and computed variances of the page-level
bility indexes such as Fog-Gunning, SMOG readability, features.
Flesch-Kincaid readability. For example, Fog-Gunning x Usability (UA) is established using 15 fea-
index is computed according to formula: tures. These features look at average downloading time
(words_per_sentence + percent_complex_words) * 0.4. for a page (indicating whether the web page load
x Comprehensibility also includes the sense of quickly) and ease of use (by examining the use of
credibility and trust and access to the source, author forms, framesets, and etc.). The usability and accessi-
and date of the information being presented. Informa- bility of the website contribute to comprehensibility. If
tion Credibility (IC) examines the knowledge of the a site is not easy to use or cannot be adjusted for acces-
information sources, authority of information and the sibility then its comprehensibility is diminished.
correctness of information. 16 features including the We examined a large pool of quantitative computa-
counts of HTML syntax errors and warnings reported ble features which are supposed to provide sufficient
by HTML Tidy [12], and number of images indicating conceptual equivalence to the above heuristic used by
advertisements are computed. Also, whether copyright human evaluators when rating the websites. In total,
information and date of last update are present are ex- 191 features are computed to quantify the heuristics,
amined. but because of the limited space of this paper, full de-
x The Media Instructional Value (MV) is used scriptions of the 191 features will not be presented in
to evaluate whether the use of graphics, icons, anima- this paper.
tion, or audio enhance the clarity of the information
and necessary to communicate the concepts. 25 fea-

193
4. Data collection page not found), 540 websites are included in our anal-
ysis.
We downloaded the most recent version of websites
in a pre-compiled list of 800 URLs that are relevant to
Science Technology Engineering and Mathematics 5. Analytical modeling
(STEM) topics. The downloading was restricted to
HTTP and HTTPS protocols. Instead of creating a Our major objective in this study is to automatically
complete copy of a website, we control the quota of determine website comprehensibility by computing a
downloading from a site up to 50 megabytes, and set set of website features. We accomplish this by model-
the number of maximum levels for the Breadth-First- ing relationships between the website comprehensibili-
Search to be 5. We do not save a copy of multimedia ty and the set of page-level and site-level features. We
files such images, audios and videos, as we do not per- use regression analysis to construct the mathematical
form multimedia processing in this study. Among these models that can best predict the website comprehensi-
800 target sites, 69 websites fail to download due to bility. Regression analysis is the most widely used me-
problems such as inactive hyperlinks, and so, we suc- thod to both describe and predict one variable (depen-
cessfully downloaded 731 websites that are listed on dent variable) as a function of a number of independent
the given entry page sheet. In total, around 0.7 million variables (explanatory or predictor variables) from
web pages are downloaded, averaging at about 1000 observed or experimental data [13]. The general form
pages per site. of our problem is ‫݂ = ݕ‬ሺ‫ݔ‬1 , ‫ݔ‬2 , ǥ , ‫ ݊ݔ‬ሻ, modeling the
Four professional librarians applied their judgment comprehensibility ‫ ݕ‬as a function of ݊ computed fea-
in the review and evaluation of these websites. The ture variables ‫ݔ‬1 , ‫ݔ‬2 , ǥ , ‫ ݊ݔ‬at page level or site level.
outcome serves as a good approximation to the gold A feature vector is constructed for each web page
standard, because, in their day-to-day work, the libra- first. The features of the pages within a website are
rian evaluators interface with the public to select a then aggregated to produce a site-level feature vector
broad range of appropriate websites and pages for according to the topological structure of the site. The
people interested in learning about a topic. The review topology structures are inferred from the linkage struc-
process consists of accessing a website page, finding ture of the documents. When there is a hyperlink point-
and reading the central concept, linking it to related ing from page ‫ ݅݌‬to page ‫ ݆݌‬, a directed path between
pages as necessary to understand the central concept, node ݅ and node ݆ is said to exist. The pages and the
and evaluating the website for adequacy for learning or linkage between them thus comprise a directed graph
instructional purposes. The librarian evaluators rated for a website. We mimic the browsing behavior of a
25 to 50 websites in a trial period to understand the learner by starting from an entry page (the first page
evaluation process and the criteria. from which a learner starts to navigate the site), and
The librarians evaluate a website for each of the six then pick a hyperlink to jump to another page. The
criteria: Information Value, Information Credibility, probability of a learner visiting a particular page is
Media Instructional Value, Affective Attention, Organ- approximated by a geometric function of the minimum
ization and Usability. And finally, an overall rating is number of hops to that page having started from the
given to each site indicating the comprehensibility of entry page. The minimum number of hops is computed
the hypertext space presented by the website. The rat- from the constructed topological graph with a shortest
ings are scored on a 1-5 Likert scale, one indicating the path algorithm. Denoting the minimum number of hops
lowest score for each criterion while five the highest between the entry page of site ݅ and page ݆ as ‫ ݆݅݌ݏ‬, we
score. Approximately each librarian reviewed four assume that the probability of browsing a page is ߙ ‫ ݆݅ ݌ݏ‬,
hundred websites in four months, with an average of where ߙ might take any fractional value, so that the
10 minutes allocated per website. Obviously they probability is within [0, 1]. The computer program thus
sample a relatively small set of pages instead of re- computes the site-level feature vectors by aggregating
viewing every page within that site. The review dura- the page-level features with a weight factor for the ݆‫݄ݐ‬
tion is sufficient to meet our objective and is typical of page of ߙ ‫ ݆݅ ݌ݏ‬. Therefore, the site-level features are
many real settings where judgments regarding a web-
aggregated the way that: ݊ features  ‫= ݅( ݅ݔ‬
site are often made very quickly by Web browsers.
1, 2, ǥ , ݊) for a website ȱ is a weighted summation of
Each site is evaluated by at least two evaluators, and
the rating results are averaged to represent the final its page-level features  ‫ = ݆( ݆݅ݔ‬1, 2, ǥ , ‫ )݌‬for ‫ ݌‬pag-
scores of the site. Excluding the websites that are es in ȋ according to Equation (1) as shown below.
skipped by the evaluators due to various reasons (e.g.,

194
‫݌‬ ‫݆ ݌ݏ‬
σ݆ =1 ‫ߙ ݆݅ ݔ‬ Table 1. Backward linear regression results
‫= ݅ݔ‬ ‫݌‬ ‫݌ݏ‬ (1) with varying alpha value
σ݆ =1 ߙ ݆
Alpha Adjusted R Std. Error of
When ‫ ݆݅݌ݏ‬takes the value of zero, i.e., page ݆ is ac- (ࢻ) Square Estimation
tually the entry page, only the entry page is included in 0.0 0.259 0.88547
the model, ‫ ݆݅ݔ = ݅ݔ‬, as the weighting factor becomes 0.1 0.272 0.88492
zero for all the rest pages. 0.2 0.271 0.88220
We model the relationship between the web com- 0.3 0.293 0.88537
prehensibility and the feature vectors by regressing 0.4 0.303 0.89559
from the data set that has been evaluated by human 0.5 0.318 0.81282
evaluators. The models are inferred with both a linear 0.6 0.325 0.79087
and a non-linear regression technique. The effects of 0.7 0.343 0.75768
different ߙ values on the predictive power of the linear 0.8 0.378 0.75281
model are also discussed. 0.9 0.419* 0.69883
1.0 0.437 0.64872
*ANOVA analysis (model fitness when ࢻ = 0.9)
5.1. Linear regression modeling Sum of Squares = 233.098
Degree of Freedom = 81
The general form of a multiple linear regression
Mean Square = 2.878
model with k independent variables is given by
F value = 5.746
‫ߚ = ݕ‬0 + ߚ1 ‫ݔ‬1 + ߚ2 ‫ݔ‬2 + ǥ + ߚ݊ ‫ ݊ݔ‬+ ߝ , where
Sig. = .000
ߚ0 , ߚ1 , ǥ , ߚ݊ are the regression coefficients that need
to be estimated, and İ is a random error term. ‫ ݕ‬is the In the above aggregation model, we take every page
comprehensibility scale of a website to predict, and in a website into our analysis. All the pages will con-
there are ݊ independent variables ‫ݔ‬1 , ‫ݔ‬2 , ǥ , ‫݊ݔ‬ tribute to characterizing the particular website accord-
representing ݊ features that are computed for the cor- ing to a weighting factor. However, parsing nearly
responding website. 1000 pages for each website is relatively computation-
A base regression model consists of 191 feature va- ally expensive, therefore only a subset of the pages are
riables. There are 540 labeled data points available used to lower the cost. First, we analyze the entry pag-
from observation. The multiple linear regression model es only, that is, the page we start from when browsing
has the rating values as dependent variables, and the the particular site. When alpha goes to zero, the weight
191 features are independent variables. Four outliers of pages other than the entry page goes to zero, so only
were removed by eliminating the data points with stan- the entry pages are considered in this case. The result is
dardized residuals outside the outlier cutoff point (plus shown as the first row of Table 1 (Adjusted R Square =
or minus 2.5). The regression produced a model with 0.259). Second, instead of parsing a single page, pages
191 predictors with an Adjusted R Square at 0.356 (ߙ indicating characteristics as a backup entry page is also
= 0.9). considered (aggregated features are the average of the
A backward selection procedure then was used to set of pages). For example, http://abc/index.htm,
search the optimal feature subset. It begins with all http://abc/index1.asp or http://abc/default.html are all
predictor variables in the regression equation and then parsed along with http://abc/index.html. On average, 5
sequentially removes them with a removal criterion pages for each website are computed. The following
specified (the entry criterion is that the significance of table shows the linear regression statistics for the
F value <= .050 and removal criterion is that the signi- second case. We actually saved about 80% computing
ficance of F value >= .100). In our case, backward se- power, while the predictive power of the test indicated
lection works better than the forward selection and by Adjusted R Square is 0.302.
stepwise selection methods because of the dependency
between a few features. For example, the sizes of im-
age files are approximated by the production of the
height and width of the images. The resulting linear
model contains 81 features, with an Adjusted R Square
at 0.437 (ߙ = 1.0).

195
Table 2. A linear regression prediction with a that have been eliminated from the backward selection
subset of pages procedure in linear regression were not included. Each
R R Adjusted R Std. Error of feature is linearly scaled to the range [0, 1], to avoid
Square Square Estimation attributes in greater numeric ranges dominate those in
0.655 0.430 0.302* 0.76880 smaller numeric ranges, and also avoid numerical dif-
*ANOVA analysis (model fitness analysis) ficulties during the calculation. We employed the epsi-
Sum of Squares = 117.088 lon-SVR with a Radial Basis Function (RBF) nonlinear
Degree of Freedom = 63 kernel. A detailed technical discussion on epsilon-
Mean Square = 1.985 SVR can be found in [15]. Parameter selection (model
F value = 3.358 selection) is essential for obtaining good SVR models.
Sig. = .000 We conducted a grid search for optimal parameters
through the parameter space of cost parameter (c), the
The linear model built shows how each feature con- epsilon in loss function (p), and gamma in the kernel
trLEXWHVWRWKHSUHGLFWLRQRIWKHZHEVLWH¶VFRPSUHKHQVi- function (g). The resulted parameters are then used in
bility. The effects are discussed by feature categories, the regression process, and they are shown under Table
as an individual feature does not explain the variations 4.
of the dependent variable. By running linear regression V-fold cross-validation is used to evaluate the mod-
on the feature sets by the categories we discussed in el fitness. V-fold cross validation chooses v random
Section 3, we notice: 1) features in MV category, partitions of the data set such that v-1out of v portions
mainly graphic elements and formatting features, are are used for SVR training and the last portion held
most closely correlated with our dependent variable. 2) back as a test set. Table 4 summarizes the SVR regres-
Text elements features such as readability indexes, sion results with 10-fold cross validation.
number of words are also factoring an important role in When comparing the Adjusted R Square of SVR
the evaluation. However, the numbers of features in with that of the linear regression at 0.419 using the
different categories are not even, i.e., a category may same input space (ߙ = 0.9), the nonlinear model shows
contain a larger number of features than another, there- an obvious higher predictive power than the linear
fore, comparing the prediction power of each category model. The only way to explain this is that some fea-
needs more caution. The regression results are shown tures contribute to the comprehensibility nonlinearly.
in Table 3. However, due to the difficulty in deciphering the black
box solutions generated by SVR models, how different
Table 3. Linear regression prediction with web characteristics contribute to the comprehensibility
features by categories (ߙ = 0.9) of a website cannot be known. Linear models can be
Feature R R Adjusted Std. Error analyzed instead to shed light on the relations between
Category Square R of Estima- comprehensibility and web characteristics as shown
Square tion earlier.
MV 0.499 0.249 0.218 0.82070
IV 0.486 0.236 0.197 0.83163 Table 4. Cross validation SVR experiments
SO 0.487 0.237 0.142 0.85970 (n = 536, k = 102)
AA 0.391 0.153 0.111 0.87498 Number R Square Adjusted R
UA 0.332 0.110 0.091 0.88471 of folds Square
IC 0.320 0.102 0.075 0.89275
10 0.658 0.5774
-c 16.0 -g 0.0078125 -p 0.00390625
5.2. Support Vector Regression
6. Conclusion and Future Work
Experiments employing support vector regression
(SVR) from the open source package of LIBSVM [14] Self-directed learners seeking Web content that they
are also conducted. SVR based on statistical learning is can easily read and understand, (i.e. content with some
a useful tool for nonlinear regression problems. Nonli- instructional value, and within a comprehensibility
near relation that may exist between the comprehensi- level that satisfies their learning objective), are chal-
bility score and the feature vectors will thus be cap- lenged to quickly evaluate a website they find through
tured with the SVR method. typical search engine results, therefore, would benefit
Our input space contains 536 vectors in 81 dimen- from an automated evaluation of website comprehensi-
sions (the dataset with an alpha at 0.9). The features bility. Our research uses an analytical approach to im-

196
proving information retrieval process for self-directed [3] Fogg, B., et al. "What Makes Web Sites Credible? A
learners by automatically evaluating web site compre- Report on a Large Quantitative Study". SIGCHI'01. 2001.
hensibility using web page characteristics shown to be Seattle, WA, USA.
most indicative of website rated high on comprehensi- [4] NIST, Web Static Analyzer Tool (WebSAT). 2002.
bility by professional librarians. We developed the
[5] Brajnik, G. "Automatic web usability evaluation: Where
artifact that quantitatively measures a large group of
is the limit?" Proceedings of the 6th Conference on Human
page-level and site-level features, and deducted analyt- Factors and the Web. 2000. Austin, TX.
ical models based on our search for a set of optimal
metrics that helps evaluate website comprehensibility. [6] Ivory, M.Y. and M.A. Hearst, "State of the art in automat-
The analytical model developed was rigorously eva- ing usability evaluation of user interfaces". ACM Computing
Surveys, 2001. 33(4): p. 470±516.
luated to see how well its assessment of the website
comprehensibility compares with the evaluations made [7] Ivory, M.Y. and M.A. Hearst, "Improving Web Site De-
by librarians. Predictive performance of both a linear sign". IEEE INTERNET COMPUTING, Special Issue on
model and a nonlinear model based of SVR is reported. Usability and the World Wide Web, 2002. 6(2).
The linear model is easier to interpret, while the SVR [8] DuBay, W.H., The Principles of Readability. 2004.
model is superior to the linear model with 16% higher
predictive power. With about 60% variations of the [9] Salmerón, L., J.J. Cañas, and W. Kintsch, "Reading Strat-
egies and Hypertext Comprehension". Discourse Processes,
comprehensibility of a website explained by 81 meas-
2005. 40(3): p. 171-191.
ured Web characteristics with the SVR model, we see
the developed artifact an effective and reliable solution [10] Díaz, P., M.-Á. Sicilia, and I. Aedo. "Evaluation of
to the comprehensibility prediction problem. Hypermedia Educational Systems: Criteria and Imperfect
However, the comprehensibility scoring is not de- Measures". International Conference on Computers in Edu-
FDWLRQ ,&&(¶ . 2002.
terministic but still remains useful when applied ap-
propriately. The scoring would highlight when a web [11] Ma, J., Z. Zhang, and R. Garcia. "Automatically Deter-
page may be a challenge to access and comprehend. mining Web Site Comprehensibility". The 16th Workshop on
Re-sorting search results based on comprehensibility Information Technologies and Systems (WITS 2006). 2006.
scoring would benefit the learner by presenting web- Milwaukee, WI, USA.
sites with higher probable comprehensibility. The [12] Raggett, D., HTML Tidy for Linux/x86 released on 1
comprehensibility scoring is useful for making a quick September 2005, HTML Tidy Project Page:
initial determination for volumes of web pages or for a http://tidy.sourceforge.net/.
specific web page that a learner desires to assess. [13] Kleinbaum, D.G., L.L. Kupper, and K.E. Muller, Ap-
So far in our preliminary study, we only conducted plied Regression Analysis and Other Multivariate Methods.
the experiments on regressing for the overall rating. Second Edition ed. 1988: PWS-KENT Publishing Company,
Experiments will be conducted for each of the sub- Boston.
category ratings in the near future. Future research will
[14] Chang, C.C. and C.J. Lin, LIBSVM: a library for sup-
also explore the possibility of filtering out of websites port vector machines. 2001.
that may be addressing specific audiences where the
content is not instructional. For example, evaluating e- [15] Smolay, A.J. and B. Scholkopfz, "A tutorial on support
commerce site on comprehensibility may be of limited vector regression". Statistics and Computing, 2004. 14: p.
199-222.
value and therefore identifying these categories of
website for removal from comprehensibility scoring
may be necessary. Lastly, an interesting application of
the comprehensibility scoring would be to include it
within a focused crawler which finds the link path with
the highest comprehensibility given a specific topic.

References

[1] Brin, S. and L. Page, "The Anatomy of a Large-Scale


Hypertextual Web Search Engine". WWW7 / Computer Net-
works, 1998. 30: p. 107-117.
[2] Kleinberg, J. "Authoritative sources in a hyperlinked
environment". Ninth Ann. ACM-SIAM Symp. Discrete Algo-
rithms. 1998: ACM Press, New York.

197

You might also like