Analysis Gender Bias

An Analysis of Gender Bias in Sports Journalism
Osama Khalid, Kwunming Pang, Jessica Ip, Quentin Lee
Abstract— Gender bias is an important issue especially of NCAA sports is gendered.

in journalism. There is a plethora of work that deals with The two main research questions we set out to
understanding gender bias in sports journalism, however explore are:
the methods currently employed by a large subsection RQ1: Is there a quantitative difference in the
of the literature are not scalable and limited in their
amount of coverage that women sports get as
functionality as they rely on hand coded datasets. We in
contrast propose an unsupervised approach which uses compared to men sports?
a keyword based classification method to quantitatively RQ2: Is there a statistical difference in the kind
analyze the disparity that exists in the amount of coverage of language used in the coverage of women sports,
and the language used in the coverage by the media to compared to men’s sports?
represent women in sports. We compare our findings with In the context of this paper, section II discusses
prior literature and conclude that women in sports are related works in this domain and the main findings of
more likely to be under represented using less formal those works and section III discusses the unsupervised
language than their male counterparts.
method used to explore these research questions, sec-
tion IV builds upon the methods from Section III and
I. INTRODUCTION
contains the results of the experiments while section V
Following Hillary Clinton’s 2016 campaign, re- discusses the implications of our findings and section
searchers showed that in the political sphere the crit- VI concludes this project by considering the possible
icism that female candidates face is more gendered extensions of our work.
and sexist in nature[23]. The work done by [10] un-
derscored that the propagation of stereotypes is used II. R ELATED W ORKS
by the dominant groups to reinforce their own status According to [5], the media not only reports on
quo and argue that the portrayal of women by mass events occurring in society but is also tied to the ideol-
media plays a vital role in this reinforcement of gender ogy of the dominant group. It reflects the cultural myths
stereotypes. of this dominant group. This subsequently leads to the
The gendered framing by the mass media is perhaps reinforcement of stereotypes against the subordinate
quite significant with regards to sports as it is the groups.
primary source of information consumption for a vast By quantitatively analyzing 1,470 minutes of tele-
majority of the population [1]. In the comparative vision coverage of sports and athletes in Sweden dur-
analysis performed by [12], the authors showed that ing 1996[14], the authors concluded that compared to
even the coverage of international sporting events like men’s sports, only 11.7% of women’s sports is covered
the Olympics is gendered, with the commentary of by television networks. The authors concluded that
female events having more sexist undertones in contrast this disparity in coverage gives the impression to the
to the male events. viewers that women athletes are less important and less
According to [21], the form and magnitude of this interesting than their male counterparts.
gender bias is dependent on multiple factors, like the [3] sampled 424 photographs from the Washington
league of play, and even the medium. The authors post’s sports section and 319 photographs from the
suggest that the language used by television news to Los Angeles Times’ sport section published from July
frame gender is different from the language used in 1980 to June 1981 and looked at the representation of
print media. male and female athletes in the photographs. Like [14],
With these nuances in consideration, we plan to limit they too concluded that women athletes are underrepre-
the focus of our project to the print coverage of college sented in the photographs with males being represented
level sport. We develop an unsupervised method to in 94% of the Washington Posts’ and 90% of the Los
quantitatively explore the extent to which the coverage Angeles Time’s photos.
While the research conducted by [14] and [3] focused Function ArticleExtraction(HTML)
on the amount of coverage and its relationship to Input: HTML Code
gender, [20] focused not on the amount, but the content Output: Cleaned Article text
of the coverage. The authors analyzed 504 covers of Define:
Sports Illustrated from 1957 to 1989 and concluded that <> → 1 Unit Distance;
out of the 769 images of athletes only 6.6% portrayed Di → Distance between text block i and i + 1;
female athletes. And out of the subset of images which hDi → Average distance between neighboring
portrayed athletes in active poses only 2.2% portrayed text blocks;
females. argmax(Li ) → length of the longest word in
In order to understand the representation of gender text block i;
at an international stage, [12] compared the variations Wi → total words in text block i;
in the representation of women in the 1992 Summer Tij → ikj 1 ;
Olympics and the 1996 Olympics. The authors ana- M ax → maximum words in the set T ;
lyzed the language used in the coverage of gymnastics while not the end of document do
and went on to conclude women athletes were dispro- find ”<” and next ”>”;
portionately more likely to be called by their first name remove all text within ”<” and ”>”;
compared to their male counterparts. end
[9] proposed a language model based method to while not the end of document do
quantify the language directed towards female and male if argmax(Li )> 30 then
athletes. They used their proposed model on tennis remove text block2 ;
post game interviews and concluded that female tennis end
players are more likely to get asked questions based if Di < hDi then
on their haircuts and fashion choices unlike their male concatenate Di and Di+1
counterparts. end
This gender bias is not just limited to the physi- end
cal space. The language used in e-sports matches is for k in T do
also gendered[17]. By analyzing 275,396,751 messages if Wk == M ax then
return k
posted in 927,247 channels on the popular social gam-
end
ing platform, Twitch, the authors of [17] concluded that
end
the language directed towards female gamers is more
Algorithm 1: Extracting article text from HTML
likely to include words which deal with appearance like
Code
“gorgeous” and “cute”,
III. M ETHODS Conference, from 3,767 to 130 (Figure 1).

The NCAA sanctions 19 men’s sports and 21
A. Article Retrieval
women’s sports. Out of the 40 NCAA sports, we focus
About 1300 colleges in the USA are part of the only on 12 sports which can be paired up based on
National Collegiate Athletic Association(NCAA). Each the gender. 10 of the sports have both a Men’s and
college is located within a particular city and each Women’s form, however Baseball is exclusively played
city has a list of associated newspapers. Using publicly by men and softball is an exclusively female sport.
available newspaper listings[13][19], we retrieved the Since both baseball and softball are ball and stick sports
list of newspapers for each city. This totaled 3,767 for with a similar playing mechanism and scoring system,
the 1,300 colleges. we have decided to pair these two sports as well (see
The NCAA is divided into three divisions and each Table I).
division is made up of several conferences that partake
1
in regional competitions. Our main focus will be on the concatenation of all text blocks between text blocks i and j
2
Since the longest non-coined word in a major
oldest of these conferences, the Big Ten Conference
dictionary,“Pseudopseudohypoparathyroidism”[22], is thirty
and the colleges associated with it. We filtered the character long, we assume that if a text block contains a longer
list of newspapers down, to the scope of the Big Ten word then that block probably has HTML code in it
Total Schools in
the NCAA = 1305
#Schools in the
Big 10 = 14
#Newspapers in #Newspapers
the US = 3767 in BIG10
Cities = 130
#URLs = #URLs = 332,750 #Relevant

10,195,139 URLs = 4,296
Fig. 1: A Summary of the Data Collection
Total Big Ten

In order to retrieve news articles which are related Women Basketball 16138 199
to the 12 sports. We possibly could have used an Men Basketball 17707 213
Optical Character Recognition System, however since Softball 19711 268
this approach relies on the existence of a collection, Baseball 34444 427
Women Ice Hockey 2192 112
comprising of all the editions of all the 130 newspapers, Men Ice Hockey 4036 189
and requires the scanning of each edition, we opted for Women Lacrosse 11759 208
a more scalable approach. We sampled 1 random article Men Lacrosse 13878 245
each from a set of 45 randomly chosen newspapers Women Volleyball 18049 239
Men Volleyball 1965 38
and checked for the existence of the article on the Women Soccer 28437 417
newspapers website. In all the 45 cases, the website Men Soccer 24324 249
had an online copy of the article. Based on this we
TABLE II: The number of players in the NCAA and
have assumed that newspapers upload copies of all the
the Big Ten Conference
print articles to their websites.
Using each newspaper’s homepage as a seed link,
we used a crawler to collect all the links which had
the same domain as the original seed link. The crawler This method resulted in the generation of 332,750
stopped once it had exhaustively crawled the entire documents corresponding to the 332,750 webpages.
website or if it had crawled a website for 6 hours. The Since most of the 130 newspapers are general interest
6 hours limit was imposed to ensure that the crawling papers, it can be posited that not all the documents are
did not run for an unmanageable amount of time. Using relevant to the 12 sports.
this method we managed to collect 332,750 URLs(See We define a relevant document as any document
Table II). which can be associated with one of the 12 sports.
Using algorithm 1, we cleaned up all the retrieved One possible method to identify relevant documents
webpages to extract the article text. could be to check for the mention of the sport. However
if an article is a continuation of an ongoing series, it
might not explicitly mention the sport and this method
Men’s Basketball ↔ Women’s Basketball would fail. Alternatively we could use filters which use
Baseball ↔ Softball sport-specific keywords. If for example a document has
Men’s Ice Hockey ↔ Women’s Ice Hockey
Men’s Lacrosse ↔ Women’s Lacrosse words like “rebound” and “dribbling”, the document
Men’s Volleyball ↔ Women’s Volleyball can be related to Basketball. Domain knowledge is
Men’s Soccer ↔ Women’s Soccer required to create these keyword based filters.
TABLE I: Paired Sports For this paper, we opted for a filter created by using
player names to return the relevant documents. This
Men Women Men Women
Basketball 1086 1192 Basketball 480 459
Baseball/Softball 475 126 Baseball/Softball 736 766
Ice Hockey 537 118 Ice Hockey 222 151
Lacrosse 151 36 Lacrosse 84 115
Volleyball 17 255 Volleyball 66 465
Soccer 161 142 Soccer 172 287
TABLE III: The number of retrieved documents TABLE IV: The Number of Games played
type of a filter has the advantage over the conventional our background corpus, we wanted to use a corpus
mentioned-sport filter, as it is able to retrieve docu- which is not derived solely from news articles as this
ments which do not explicitly mention a sport by name. might add an unwanted bias, and we opted for the
And unlike the keyword based filter, this method is Brown Corpus[8]. An unwanted artifact of our retrieval
unsupervised and does not require domain knowledge. method is the over representation of player names in
This unsupervised method can not only retrieve the our document sets, so in order to minimize this, we
relevant documents but can also classify them based on preprocessed our datasets to remove the detected first
the gender of the mentioned players . We acknowledge name, last name filter combinations
that this retrieval method is somewhat limited, as it will
not be able to retrieve documents which do not mention IV. R ESULTS
a player by name. As the current scope of our project A. Coverage
deals with gendered documents, documents which do
not explicitly make use of gendered language are not Out of the 4,296 documents that were retrieved,
considered relevant to our experiments (see Table III). 2,427 were labeled as male and 1,869 were labeled as
The set of keywords used in this method were female. One could argue that this difference in coverage
generated by using a spider to retrieve the list of Big might be a product of the fact that the men’s teams
Ten players (see Table II)from the NCAA statistics have played more games in the 2017 season than their
website [18] for the 2017 season(See Section V). This female counterparts. In order to investigate this claim,
method can be repeated to get the list of players for we aggregated the total number of games that each
any season. Using the filters created from this player university played (see Table IV). In total 4003 games
list, we were able to retrieve 4,296 documents out of were played and out of these 2243 were played by
the original set of 332,750 (see Fig 1). women’s teams and 1760 by their male counterpart.
Basketball was the outlier in terms of the number of
B. Statistically Over-represented Words articles published(see Fig 2). Both the men’s basketball
Since the documents for each sport type can only teams and the women’s basketball teams,compared to
have two classes, male or female, a TFIDF based all the other 10 sports, received a disproportionately
approach would not be able to identify the statistically large amount of coverage for the number of games that
over-represented words. Instead like [17], we use log- they played.
odd ratios with informative Dirichelt priors[15]to iden- We calculated the correlation coefficient between
tify words that are over-represented in our datasets. the men’s sports and the coverage they receive to
Given a background corpus α and a sport s, we cal- be 0.7555 and between the women’s sports and the
culate the log-odds ratio ws between that two classes, coverage they receive to be 0.9752(excluding basketball
male, m and, female, f . The log-odds for a word in
any sport corpus can be calculated as follows: in both cases). From this we can see that, separately,
m +α
yw f
yw
the coverage both men’s and women’s sports received
(m−f ) s w s + αw
δw s = log − log f is strongly correlated with the number of matches they
nm + α0 − (yw
m +α )
s w nf + α0 − (yw s + αw )
(1) play. However when gender is ignored and both the
Here nm is the size of the male corpus, n is f
dataset are combined, the overall correlation coefficient
the size of the female corpus, yw m is the count of
s
drops to 0.4099(excluding Basketball).
f
word w in corpus m for a sport s and similarly yw s If the gender of the players was not a significant fea-
is the count of word w in corpus f for a sport s. ture, then the correlation of the combined dataset would
α0 is the size of the background corpus and αw is the have been similar to the correlation of the separate
frequency of word w in the background corpus. For datasets. The drop in correlation can be interpreted as
Fig. 2: Relationship between number of games played and number of articles published
being indicative of the fact that the difference between We represent the relationship between the number of
the numbers of articles published is gender dependent. articles and games played, for both genders, using the
linear regression models shown in equations 2 and 3.
This gender bias is further illustrated in Figure 3, which
shows the proportion of articles that were published Am = 0.5489Pm + 127.49 (2)
for each gender for each sport and the proportion of Af = 0.1272Pf + 89.99 (3)
games played by each gender for each sport. From
Tables III and IV, it can be seen that even though in the above equations, Am and Af represent the
men and women played a comparable number of number of articles published about men’s and women’s
baseball/softball matches, (49.0%, 51.0% respectively), teams respectively and Pm and Pf represent the number
79.0% of the articles published were about baseball and of games played by men’s and women’s teams respec-
just 21.0% were about softball. This disparity is even tively.
more extreme in case of lacrosse, in which even though It can be seen from the above models that, on
men played just 42.2% of the games, 80.8% of the average, for every one game that a men’s team plays
articles were published about the men’s team. The only about 0.5489 articles are published but for every one
exceptions to this trend are basketball and volleyball. In game that a women’s team plays, only 0.1272 articles
the case of basketball, 51.1% of the games were played are published.
by the men’s team and 47.6% of the articles were about B. Content
the men’s team.
In order to analyze the linguistic difference in the
For men, lacrosse had the greatest difference(38.6%) contents of the documents, for each sport pair, we
between the proportion of games played and the num- ranked all the words according to their log-odd scores
ber of articles published. In case of women the greatest (equation 1). In order to avoid the bias created by the
difference was for volleyball. Women played 87.6% of over representation of our search filters, we removed
the volleyball games and were covered in 93.8% of all the filter terms. We retrieved the top 10 and the
the articles, a difference of 6.2%. Overall it can be bottom 10 ranks. The top 10 ranks represented words
observed that women are underrepresented in 4 out of that were statistically over-represented for the male
the 6 sports under consideration. datasets, and the bottom 10 ranks represented words
(a) Proportion of articles published for each gender (b) Proportion of games played by each gender
Fig. 3: Gender and Proportionality
that were statistically over-represented for the female to be compared to professional players and mentioned
datasets. Table V shows the 10 most over represented in the same article as them.
words for each gendered sport. From the sample of top 10 words, it can be observed
Using the consensus of a grad student and a domain that the out of the 32 names present in the male corpus,
expert we were able to classify the retrieved words into only one;‘kevin’, was a first name. For women 3 out of
three distinct types. the 13 names were first names. This result seems to be
Type I: Technical Words: a word that is associated consistent with the findings of[14], who in their study
with the sport, it can include technical terms like of media coverage of cross-country skiing reported that
‘netminders’, location names like ‘layola’ or events and commentators used men’s first names 5.1% of the times
trophies named after past players like ‘Naismith’ and and in comparison women’s first names were used
‘tenpac12’. 21.5% of the times. While last names were used 29.1%
Type II: Player Names: Since our dataset does not of the times for men and 18.2% of the times for women.
contain the keywords we used to filter the documents, The disproportionate use of first names for women
this type of words include instances of just a player’s can be understood using the semiotics framework pro-
first name like ‘cara’ or last name like ‘seider’, or of posed by [6]. In this context, the use of first names
names of players on opposing teams, or names of play- can symbolize informality. [11] argues that members of
ers who professionally play the sport like ‘olofsson’. dominant groups are more often referred to formally;
Type III: Unrelated Words: This type includes using their last names, while the subordinate members
words not directly related to the sport like ‘barack’ are referred to more informally, thus creating a power
Using this classification we can observe that the top differential. Language can be used as a tool to reinforce
9 over represented words for Baseball were all Type II, social power. In this context the disparity in the medias
whereas for softball all the words were Type I. It can use of language underscores sports as a masculine
also be seen that for all sports, the male sports had an activity and projects womens participation as anomic
over representation of Type II words. as sports are incongruent to the female stereotype.
One explanation of this over representation of Type II
words can be that articles which cover women’s sports V. D ISCUSSION AND L IMITATIONS
are more focused and primarily discuss one individual As discussed in the previous section, more type I
player (the filter word), while for men’s sports the words are used in relation to women than they are
articles are more diverse and do not just talk about one to men. The use of language can be considered to be
player but also discuss the opposing teams. This would ideologically mediated [2], and one could argue that the
explain the occurrences of names like ‘Jaylon’ who over use of type I words stems from the newspapers
played basketball at the University of Evansville, which perception of their audience. The newspapers might
is not in the Big Ten. The occurrences of professional believe that the readership of the articles about female
players like ‘Olofsson’ can be interpreted as indicative athletes themselves are not well versed in the nuances
of the fact that players on men’s teams are more likely of the sport and the use of technical terms is to help
rosenbaum
californias
followup
blaisdell
watkins
Women
aanhpi
revzen
viejo
lorin
this audience better understand the sport. Conversely
aclu
in case of the mens sports, the use of more Type II
might signify that the newspapers believe that sports
saintfrancis
jovanovski
Volleyball
boogaard
are an integral part of the habitus[4] of the readers and
mcgrath
Reagan
wilkins
gillies
Mens
as such they do not need to explain the technicalities to
Men
bray
Li
their audience. If sports are perceived as a masculine
Goalscorers activity, then women athletes are considered outsiders
Netminders
Hardfought
Secondbest
with not as power as their male counterparts. Language

Women
josephs
starters
Libero
can be used as a tool to signal and reinforce the power
boyle
Katy
Cara
of male athletes in the sporting world.
Using an unsupervised approach, we were able to
Ice Hockey
TABLE V: Statistically over-represented represented gendered words in each sport

answer the two research questions we had initially set
potterfield
szmatula
olofsson
calgary
topline
out to explore.
dumba
disney
Men
lhes
RQ1: Is there a quantitative difference in the

csu
ahl
amount of coverage that women sports get as

Onegoal
compared to men sports?

Women
Skillset
Scorers
Barack
Libero
loyola
Cara
Rick
Our findings (see IV-A) found that there is indeed

Nov
usa
a statistical difference in the amount of coverage that

the two genders receive. And if both the men’s and
zuckerberg
hennepin
Threerun
freeland
women’s teams play the same number of games, the

kislyak
renfroe
Soccer
godley
stanek
torey
men’s teams on average receive 4.32 times more cov-

Men
lhp
erage than their female counterparts.

RQ2: Is there a statistical difference in the kind
cummings
Type III: Unrelated Words

everyones
Type I Technical Words

Type II Player Names
sparking
rescigno
mindset
Women
of language used in the coverage of women sports,

11of18
Bushs
top20
Legend:
ashs
compared to men’s sports?

uni
We have observed (see IV-B) that the articles about

women’s sports are more likely to mention the players’
metrodome
skywalker
Lacrosse
bannons
redskins
first names and are more likely to explain the various

fivestar
stinner
howen
sydow
brandi
kevin
basics of the sports and subsequently are more likely

Men
to assume less knowledge amongst the readership .

While the articles about men are more likely to use
mccutcheon
14thranked
the players’ last names and are less likely to explain

naismith
Women
andone
hurdler
boxing
11seed
touted
perrys
basic concepts.
8of8
Overall our findings are consistent with previous

studies conducted on professional level sports[12]. One
tournamentncaa
major take away from our work is that it shows that

supersectional
sportswright
gender bias is present even at non-professional levels,

Basketball
skywalker
statencaa
and it can only be minimized by putting in a collective

braggin
peoria
jaylon
wentz
Men
effort at all levels from the most basic to the most

uis
professional.
At this point we would like to acknowledge, that
spokeswoman
Backandforth
allconference
Sevengame
our work is not exhaustive enough as it relies on just

Midseason
basepaths
mankato
Women
12 sports and considers only 14 out of potentially 1300

Out-hit
Redhot
Mitt
colleges. It is also limited in terms of its temporal scope

as it only focuses on the coverage from the year 2017.
Base/Softball
VI. F UTURE W ORK

struttmann
tenpac12
rowson
browns
leidner
In the future we would like to scale up our work

cullen
seider
diggs
klans
Men
staal
to include a retrospective study that looks at how the

differences in coverage and language have changed
over the years. We would also like to perform a more that report on the athletics in that city, but that does
exhaustive analysis of all 1300 colleges. not accurately reflect the performance of the team.”
Our current work analyzes the text of the articles at
C. Kwunming Pang
a word level. In the future we would like to expand this
and use LIWC[16] to analyze the text at an attribute Collected raw scores for the log-odds ratios, did the
level3 . literature survey.
“I learned a lot of things in the project, including how
ACKNOWLEDGEMENTS to communicate and work with the team members, as
We would like to thank Lily Smith, the Visual Arts well as the process to do a research.”
Director at The Daily Iowan for helping us with the D. Osama Khalid
domain knowledge required to interpret Table: V.
Wrote this entire report, and basically did all the
VII. C ONTRIBUTIONS AND L ESSONS non-trivial work and most of the trivial work as well,
including but not limited to designing the algorithms,
A. Quentin Lee
running the experiments and tabulating the results
Wrote the second progress report, coordinated group Since I have written this entire report, all the dis-
members on their responsibilities for the progress re- cussions and inferences in this report represent my
ports. learning.
“I learned the process of sifting through large
amounts of data to narrow down the data to manageable R EFERENCES
amounts so that that data can then be processed and [1] Ahlin, T. Kulturvanor I norden. En underskning av kultur-
analyzed. I also learned more about writing scientific och medievanor I de nordiska lnderna. Culture activites in the
Nordic countries (pp. 88-91). Statistical reports of the Nordic
papers that apply to computer science topics and what Countries, No. 62. Stockholm: Nordstedts Tryckeri AB, 1993
is necessary to make sure that the paper is a high [2] Bakhtin, Mikhail M. “The dialogic imagination: Four essays
enough quality to be comparable to published scientific by mm bakhtin (m. holquist, ed.; c. emerson & m. holquist,
papers.” trans.).” (1981).
[3] Blackwood, Roy E. “The content of news photos: roles
portrayed by men and women.” Journalism Quarterly 60.4
B. Jessica Ip (1983): 710-714.
Counted the raw number of articles that are relevant [4] Bourdieu, Pierre. Outline of a Theory of Practice. Vol. 16.
Cambridge university press, 1977.
to each school for each sport and for both genders.
[5] Carll, Elizabeth K. “News portrayal of violence and women:
Created charts that compared the number of articles per Implications for public policy.” American Behavioral Scien-
city with each sport and the number of articles per city tist46.12 (2003): 1601-1610.
with the revenue of the school. Analyzed the number of [6] De Saussure, Ferdinand. Cours de linguistique gnrale: dition
critique. Vol. 1. Otto Harrassowitz Verlag, 1989.
articles with the win/loss ratio of each sport and given [7] Eastman, Susan Tyler, and Andrew C. Billings. “Sportscasting
that some schools do not have the team for the sport and sports reporting: The power of gender bias.” Journal of
we are analyzing, those are given 0-0 as the entry. Sport and Social Issues 24.2 (2000): 192-213.
“The revenue of each school does not necessarily [8] Francis, W. Nelson. “BROWN COR-
PUS MAUNAL.” Brown Corpus Manual,
indicate that the school will receive more attention from clu.uni.no/icame/manuals/BROWN/INDEX.HTM.
newspapers: Ohio St and Michigan have the highest [9] Fu, Liye, Cristian Danescu-Niculescu-Mizil, and Lillian Lee.
revenues, respectively, out of the Big Ten schools, but “Tie-breaker: Using language models to quantify gender
bias in sports journalism.” arXiv preprint arXiv:1607.03895
the number of articles that pertain to the school does (2016).
not reflect that statistic. Additionally, there is no direct [10] Guillaumin, Colette. Racism, sexism, power and ideology.
correlation between the publicity a school gets and the Routledge, 2002.
performance of its teams. Because the condition of the [11] Henley, Nancy M., and Sean Harmon. “The nonverbal se-
mantics of power and gender: A perceptual study.” Power,
newspaper data is that any newspaper based in the city dominance, and nonverbal behavior. Springer, New York, NY,
of the Big Ten counts towards its article count, larger 1985. 151-164.
cities receive more hits than smaller cities. Minneapolis [12] Higgs, Catriona T., Karen H. Weiller, and Scott B. Martin.
“Gender bias in the 1996 Olympic Games: A comparative
is the largest city in the Big Ten and therefore there are
analysis.” Journal of Sport and Social Issues 27.1 (2003): 52-
more article hits because there are more newspapers 64.
[13] “IndexMundi - Country Facts.” IndexMundi - Country Facts,
3
In this context we define the classes used by LIWC as attributes www.indexmundi.com/.
[14] Koivula, Nathalie. “Gender stereotyping in televised media [20] Salwen, Michael B., and Natalie Wood. “Depictions of female
sport coverage.” Sex roles 41.7-8 (1999): 589-604. athletes on” Sports Illustrated” covers, 1957-1989.” Journal of
[15] Monroe, Burt L., Michael P. Colaresi, and Kevin M. Quinn. Sport Behavior 17.2 (1994): 98.
“Fightin’words: Lexical feature selection and evaluation for [21] Semetko, Holli A., and Patti M. Valkenburg. “Framing Euro-
identifying the content of political conflict.” Political Analysis pean politics: A content analysis of press and television news.”
16.4 (2008): 372-403. Journal of communication 50.2 (2000): 93-109.
[16] Pennebaker, James W., Martha E. Francis, and Roger J. Booth. [22] “What Is the Longest English
“Linguistic inquiry and word count: LIWC 2001.” Mahway: Word?” AskOxford, 12 Apr. 2012,
Lawrence Erlbaum Associates 71.2001 (2001): 2001. http://www.askoxford.com/asktheexperts/faq/aboutwords/longestword.
[17] Nakandala, Supun, et al. “Gendered Conversation in [23] Wilz, Kelly. “Bernie Bros and Woman Cards: Rhetorics of
a Social Game-Streaming Platform.” arXiv preprint Sexism, Misogyny, and Constructed Masculinity in the 2016
arXiv:1611.06459(2016). Election.” Women’s Studies in Communication 39.4 (2016):
[18] NCAA Career Statistics, web1.ncaa.org/stats/StatsSrv/careersearch. 357-360.
[19] NewsMap, newsmap.mhlakhani.com/.

Analysis Gender Bias

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis Gender Bias

Uploaded by

Copyright:

Available Formats

An Analysis of Gender Bias in Sports Journalism

Osama Khalid, Kwunming Pang, Jessica Ip, Quentin Lee

Abstract— Gender bias is an important issue especially of NCAA sports is gendered.

III. M ETHODS Conference, from 3,767 to 130 (Figure 1).

#URLs = #URLs = 332,750 #Relevant

Fig. 1: A Summary of the Data Collection

Total Big Ten

with not as power as their male counterparts. Language

TABLE V: Statistically over-represented represented gendered words in each sport

RQ1: Is there a quantitative difference in the

amount of coverage that women sports get as

compared to men sports?

Our findings (see IV-A) found that there is indeed

a statistical difference in the amount of coverage that

women’s teams play the same number of games, the

men’s teams on average receive 4.32 times more cov-

erage than their female counterparts.

Type III: Unrelated Words

Type I Technical Words

of language used in the coverage of women sports,

compared to men’s sports?

We have observed (see IV-B) that the articles about

first names and are more likely to explain the various

basics of the sports and subsequently are more likely

to assume less knowledge amongst the readership .

the players’ last names and are less likely to explain

Overall our findings are consistent with previous

major take away from our work is that it shows that

gender bias is present even at non-professional levels,

and it can only be minimized by putting in a collective

effort at all levels from the most basic to the most

our work is not exhaustive enough as it relies on just

12 sports and considers only 14 out of potentially 1300

colleges. It is also limited in terms of its temporal scope

VI. F UTURE W ORK

In the future we would like to scale up our work

to include a retrospective study that looks at how the

You might also like