Professional Documents
Culture Documents
Mobile App
app quality is to determine the chal-
lenges users face.
The App Store lets users review
Users Complain
their downloaded apps. Besides as-
signing star ratings (all of which
are aggregated and displayed on a
version-level and an app-level basis),
M AY/J U N E 2 0 1 5 | I E E E S O F T WA R E 71
FEATURE: MOBILE APPS
High Adobe Photoshop Express Photo & Video 3.5 1,030 280
(3.5 stars)
CNN News 3.5 1,748 315
Low Epicurious Recipes & Shopping List Lifestyle 3.0 940 273
(< 3.5 stars)
FarmVille Games 3.0 10,576 371
fee for premium features.) We en- Collecting Reviews for each of the apps during the first
sured that the apps had at least 750 The iOS App Store doesnt provide a week of June 2012.
reviews so that a few users didnt public API for automatically retriev-
skew the tagged reviews we ana- ing reviews. So, we obtained the re- Selecting Reviews
lyzed. We also ensured that half of views from AppComments (http:// The 20 apps had more than 250,000
the apps had an overall high rating appcomments.com), a Web service one- and two-star reviews. As we
(3.5 or more stars) and that the other that collects reviews of all iOS apps. mentioned before, we studied a sta-
half had an overall low rating (less We built a Web crawler that visited tistically representative sample of the
than 3.5 stars) because we wanted to each unique page with a specific iOS reviews. To determine the sample
identify the complaints in both the review. We parsed the reviews to ex- size, we used Creative Research Sys-
high- and low-rated apps. The apps tract data such as the app name, the tems Sample Size Calculator (www
covered 15 of the 23 categories in review title, the rating, and the com- . su r veysystem.com /ss c a lc.ht m).
the iOS App Store. ments. We collected all the reviews We randomly chose the sample to
72 I E E E S O F T WA R E | W W W. C O M P U T E R . O R G / S O F T W A R E | @ I E E E S O F T WA R E
Inputs = All reviews (each with a review title and comment) and a list of complaint types (which is initially empty)
Else:
Add a new complaint type to the list of complaint types.
Restart tagging with the new list of complaint types.
Outputs = All reviews (tagged with the appropriate complaint types) and a list of the complaint types
FIGURE 2. The review-tagging procedure. This iterative process helped minimize the threat of human error during tagging.
achieve a 95 percent confidence level Figure 2 shows how we tagged ferent apps with a varying number
and a 5 percent confidence interval. the reviews. Each time we identi- of reviews. Owing to the high devi-
This means were 95 percent confi- fied a new complaint type, we went ance of each complaint type between
dent that each result is within a 5 through all the previously tagged different apps, we used the median
percent margin of error. reviews to see whether to tag them to summarize the frequency of each
For example, Adobe Photoshop with the new type. We had to restart complaint type across the apps.
Express had 1,030 one- and two-star tagging three times after discovering The first three columns of Table 3
reviews. The statistically representa- new types. Sometimes, a reviewer show the complaint type, its rank, and
tive sample for 1,030 reviews, with provided no meaningful comments its median percentage. Three types
a 95 percent confidence level and a (for example, simply saying the app accounted for more than 50 percent
5 percent confidence interval, is 280 was bad). We tagged such reviews as of all complaints: Functional Error,
reviews. So, we randomly selected Not Specific. For reviews contain- Feature Request, and App Crashing.
280 of those reviews for manual ing multiple complaints, we tagged To better understand Functional
examination. them with multiple complaint types. Error, the most frequent type, we
In total, we manually examined For example, if a review mentioned examined the most frequently used
6,390 reviews. We performed our a network problem and complained terms in the related reviews. Then,
sampling on a per-app basis because about the app crashing, we tagged we read through the review com-
different apps have varying numbers the review with Network Problem ments that used these terms. We
of reviews and we wanted to cap- and App Crashing. found that 4.5 percent of the func-
ture the complaints across the differ- tional errors were about location is-
ent apps. The number of randomly Results sues and that 7.3 percent were about
sampled reviews for each app ranged We ended up with 12 complaint authentication problems. Heres an
from 264 to 383 (see the last column types (see Table 2). example of a functional-error review
of Table 1). in which a user reported an authenti-
The Frequency of Each Complaint Type cation problem:
Tagging Reviews We calculated the frequency of the
To identify the complaint types, we complaint types for each app. Then, Dont do the update! When I try to
performed coding, which turns qual- we normalized the frequency (the log in, it just keeps refreshing the
itative information into quantitative number of complaints of a specific screen.
data.5,6 One of us read each review type divided by the total number of
to determine the type of complaint it sampled reviews for an app) so that Examining Feature Request, we
mentioned. we could compare results across dif- found that most requests were app
M AY/J U N E 2 0 1 5 | I E E E S O F T WA R E 73
FEATURE: MOBILE APPS
App Crashing The app often crashed. Crashes immediately after starting.
Compatibility The app had problems on a specific device or an OS I cant even see half of the app on my iPod Touch.
version.
Feature Removal A disliked feature degraded the user experience. This app would be great, but get rid of the ads!
Feature Request The app needed additional features. No way to customize alerts.
Functional Error The problem was app specific. Not getting notifications unless you actually open
the app.
Hidden Cost The full user experience entailed hidden costs. Great if you werent forced to buy coins for REAL
money.
Interface Design The user complained about the design, controls, or The design isnt sleek and isnt very intuitive.
visuals.
Network Problem The app had trouble with the network or responded New version can never connect to server!
slowly.
Privacy and Ethics The app invaded privacy or was unethical. Yet another app that thinks your contacts are fair
game.
Resource Heavy The app consumed too much energy or memory. Makes GPS stay on all the time. Kills my battery.
Uninteresting The specific content was unappealing. It looks great, but the actual gameplay is boring and
Content weak.
Unresponsive App The app responded slowly to input or was laggy overall. Bring back the old version. Scrolling lags.
Not Specific The users comment wasnt useful or didnt point out a Honestly the worst app ever.
problem.
specific. However, 6.12 percent of plaint type among the 10 highest- developers identify features users
the requests were for better notifica- rated and 10 lowest-rated apps. To want or really hate.
tion support. do this, we used a two-tailed Mann-
Network Problem, Interface De- Whitney U test with < 0.05. We The Impact of Each Complaint Type
sign, and Feature Removal com- didnt find any statistically signifi- We determined which of the most
plaints were also frequent. Another cant difference between the highest- common complaints were the most
complaint was Compatibility, which rated and lowest-rated apps. negatively perceived by users. We
is an important issue for iOS devices. Our findings highlight the im- looked at the ratio of one- to two-
This refers to the app not working portance of software maintenance star ratings for each complaint type
correctly on a specific device or OS for iOS apps because many of the (across all apps). For example, a ra-
version. Surprisingly, complaints frequent complaints were related di- tio of 5 indicated that a type had five
about compatibility, resources, and rectly to developmental issues (for times as many one-star ratings as
app responsiveness werent as fre- example, Functional Error, App two-star ratings.
quentwe expected more of them. Crashing, and Network Problem). The last two columns of Table 3
We also examined whether the We believe developers can avoid such show the rank and ratio for each com-
complaint types varied between the low ratings by an increased focus plaint type. The most negatively per-
highest- and lowest-rated apps. We on QA. Also, low ratings frequently ceived complaints differed from the
compared the frequency of each com- contain information that can help most frequent complaints. Privacy and
74 I E E E S O F T WA R E | W W W. C O M P U T E R . O R G / S O F T W A R E | @ I E E E S O F T WA R E
The most frequent and impactful complaint types.*
TABLE 3
This column indicates the ratio of one- to two-star ratings across all apps.
Ethics, Hidden Cost, and Feature Re- For example, Hulu Plus is free to Discussion
moval were the three most negatively download but has a monthly sub- For many of the complaints, users
perceived complaints (and were mostly scription cost and ads in streaming reported they had recently updated
in one-star reviews). This means that videos. Because of the monthly sub- their app. So, we wanted to study
users were bothered most by issues re- scription requirement, more than the apparent relationship between
lated to privacy invasion and the app 55 percent of the low ratings for updates and complaints. We also
developers unethical actions (for ex- Hulu Plus were about the hidden examined the relevance of different
ample, unethical business practices costs. On closer examination, we types of complaints to software proj-
or selling the users personal data). found that the low ratings were due ect stakeholders (for example, devel-
To avoid such complaints, developers to the developers poor description opers versus project managers).
should access only the data (for ex- of the app or a misunderstanding
ample, the users contacts or location) by the user. Update-Related Complaints
specified in the apps description. Developers should devote extra We could know whether a complaint
Hidden Cost indicated users dis- attention to App Crashing, Hidden was update related only if the user
satisfaction with the hidden costs Cost, and Feature Removal com- mentioned it in the review; however,
needed for the full experience of an plaints because theyre frequent and other complaints could also have
app. This complaint showed up in 15 users perceive them negatively (see been update related.
of the apps. When an app was free to Table 3). Also, our study results Approximately 11 percent of the
download but not free to use, the us- stress the importance of developers sampled reviews mentioned that
ers were disappointed and often gave establishing trust and expectations a recent update impaired existing
low ratings. with app users. functionality. In 22 percent of these
M AY/J U N E 2 0 1 5 | I E E E S O F T WA R E 75
FEATURE: MOBILE APPS
EMAD SHIHAB is an assistant professor in Concordia Univer- Also, 18.8 percent of post-update
sitys Department of Computer Science and Software Engineer- reviews included requests for a new
ing. Hes particularly interested in mining software repositories,
or previously removed feature. In
software quality assurance, software maintenance, empirical
software engineering, and software architecture. He received a addition, 18.2 percent of the post-
Natural Sciences and Engineering Research Council of Canada update reviews complained about
Alexander Graham Bell Canada Graduate Scholarship. He has frequent crashing.
served on the program committees of the International Confer-
Developers often release free apps
ence on Software Maintenance, the Working Conference on
Mining Software Repositories (MSR), the International Confer- in hopes of eventually monetizing
ence on Program Comprehension, and the Working Conference them by transforming free content or
on Reverse Engineering. He has also been an organizer of the features to paid ones. We found that
MSR 2012 challenge and MSR 2013 data showcase and a
6.8 percent of the post-update re-
program chair for the 2013 International Workshop on Empirical
Software Engineering in Practice. Contact him at emad.shihab@ views complained about this hidden
concordia.ca. cost. Another important post-update
complaint dealt with changes in the
interface design; 6.2 percent of the
post-update reviews contained such
MEIYAPPAN NAGAPPAN is an assistant professor in the complaints.
Rochester Institute of Technologys Department of Software On the basis of these fi ndings, we
Engineering. He previously was a postdoctoral fellow in the
recommend that developers pay spe-
Software Analysis and Intelligence Lab at Queens University.
His research centers on using large-scale software engineering cial attention (for example, through
data to address stakeholders concerns. Nagappan received a regression testing and user focus
PhD in computer science from North Carolina State University. groups) to features they might con-
He received a best-paper award at the 2012 International Work-
sider removing, to adding fees, and
ing Conference on Mining Software Repositories. Contact him at
mei@se.rit.edu; mei-nagappan.com. to user interface changes they might
introduce. Even if users have previ-
ously liked an app, a bad update
could be irritating enough to make
AHMED E. HASSAN is the Natural Sciences and Engineering them give the app a low rating.
Research Council of Canada / BlackBerry Software Engineer-
ing Chair at the School of Computing at Queens University.
His research interests include mining software repositories,
Identifying Stakeholders
empirical software engineering, load testing, and log mining. Because users review apps as a
Hassan received a PhD in computer science from the University whole, they often raise issues that
of Waterloo. He spearheaded the creation of the International arent directly the developers re-
Working Conference on Mining Software Repositories and
sponsibility; some complaints are
its research community. Hassan also serves on the editorial
boards of IEEE Transactions on Software Engineering, Empirical directed toward product managers
Software Engineering, and Computing. Contact him at ahmed@ or other team members. To identify
cs.queensu.ca. these stakeholders, we divided the
complaints into three categories.
76 I E E E S O F T WA R E | W W W. C O M P U T E R . O R G / S O F T W A R E | @ I E E E S O F T WA R E
Development-related complaints Potential Threats References
were related directly to developers. to Validity 1. S. Perez, iTunes App Store Now Has 1.2
Million Apps, Has Seen 75 Billion Down-
They included App Crashing, Func- Because we performed our study on loads to Date, Techcrunch, 2 June 2014;
tional Error, Network Problem, Re- only a sample of 20 iOS apps, our re- http://techcrunch.com/2014/06/02/itunes
-app-store-now-has-1-2-million-apps-has
source Heavy, and Unresponsive sults might not generalize to all iOS -seen-75-billion-downloads-to-date.
App and constituted 45.6 percent of apps. To mitigate this threat, we max- 2. S. Agarwal et al., Diagnosing Mobile
all the complaints. So, many of the imized the coverage of complaints by Applications in the Wild, Proc. 9th ACM
SIGCOMM Workshop Hot Topics in
complaints were directly related to studying apps that covered most of Networks, 2010, p. 22.
problems developers could address. the categories in the App Store. 3. N. Hu, P.A. Pavlou, and J. Zhang, Can
Strategic complaints primar- Also, as we mentioned before, Online Reviews Reveal a Products True
Quality? Empirical Findings and Analyti-
ily concerned project managers but one of us manually tagged the re- cal Modeling of Online Word-of-Mouth
could also partially target devel- views. During this process, human Communication, Proc. 7th ACM Conf.
Electronic Commerce (EC 06), 2006, pp.
opers. These complaints included error or subjectivity could have led
324330.
Feature Removal, Feature Request, to incorrect tagging. To address this 4. J.A. Chevalier and D. Mayzlin, The
Interface Design, and Compatibil- threat, the other authors randomly Effect of Word of Mouth on Sales: Online
Book Reviews, J. Marketing Research,
ity and constituted 22.7 percent of inspected the reviews and corre- vol. 43, no. 3, 2006, pp. 345354.
all complaints. The issues related to sponding tags. 5. C.B. Seaman et al., Defect Categoriza-
these complaints required greater tion: Making Use of a Decade of Widely
U
Varying Historical Data, Proc. 2nd
knowledge of the project and pri- ACM-IEEE Intl Symp. Empirical Soft-
orities and usually didnt have a ser reviews strongly af- ware Eng. and Measurement, 2008, pp.
149157.
straightforward solution. fect developers and orga-
6. C.B. Seaman, Qualitative Methods in
Content complaints concerned nizations that develop iOS Empirical Studies of Software Engineer-
the content or value of the app it- apps. Low ratings negatively reflect ing, IEEE Trans. Software Eng., vol. 25,
no. 4, 1999, pp. 557572.
selfdevelopers had little or no on their apps quality, thus affect-
control over the issues related to ing the apps popularity and even-
these complaints. These complaints tually their revenues. To compete in
included Privacy and Ethics, Hid- an increasingly competitive market,
den Cost, and Uninteresting Con- developers must understand and ad-
tent. Addressing these complaints dress their users concerns.
would require rethinking the apps Our fi ndings point to new soft-
core strategy (the business model or ware engineering research avenues,
the content offered). Although these such as how ethics, privacy, and
complaints accounted for only 3.02 user-perceived quality affect mo-
percent of all complaints, Privacy bile apps. We plan to expand on this
and Ethics and Hidden Cost had the study by considering more apps and Selected CS articles and columns
most negative impact, as we men- comparing our fi ndings across other are also available for free at
http://ComputingNow.computer.org.
tioned before. mobile platforms.
Subscribe today for the latest in computational science and engineering research, news and analysis,
CSE in education, and emerging technologies in the hard sciences.
www.computer.org/cise
M AY/J U N E 2 0 1 5 | I E E E S O F T WA R E 77