You are on page 1of 4

Summarizing and Referring: Towards Cohesive Extracts

Patricia Nunes Lucia Rino Renata Vieira


Gonçalves UFSCAR PUCRS
PUCRS São Carlos - Brasil Porto Alegre - Brasil
Porto Alegre - Brasil lucia@dc.ufscar.br renata.vieira@pucrs.br
patt.nunes@gmail.com

ABSTRACT ments using CorrefSum. Section 5 presents related work us-


In this paper we propose and evaluate a system for summary ing coreference resolution and summarization. Conclusions
post-edition, which aims at replacing referential expressions, are presented in Section 6.
trying to avoid referencial cohesion problems. To propose
expressions that best represent the evoked entity, the sys- 2. REFERENTIAL COHESION AND
tem uses knowledge about coreference chains. We evaluate COREFERENCE
the system both with knowledge provided by manual and
A text is more than a simple sequence of words. There
automatic annotation of coreference chains.
are many inner-relationships among them in the discourse.
One way to establish such text cohesion is through referring
Categories and Subject Descriptors expressions [7]. When they address the same entity, they
J.5 [Computer Applications]: Linguistics; H.4.2 [Infor- signal a coreference chain. In the example below, extracted
mation Systems Applications]: Decision support from the corpus (Section 3), there is a coreference chain:“the
agronomist Miguel Guerra, from Universidade Federal de
Santa Catarina”, “Guerra”, “the agronomist”.
General Terms The discussion about national biotechnology, has
Algorithms,Experimentation been understood as transgenic. That is the opin-
ion of the agronomist Miguel Guerra, from
Universidade Federal de Santa Catarina. Guerra
Keywords has cited the plant micro-propagation as an ex-
Referencial Cohesion, Coreference Chains, Automatic Sum- ample of low cost biotechnology. According to the
agronomist, Brazil must search the development
marization of transgenic.
This work considers that referential cohesion of extractive
1. INTRODUCTION summaries can be improved if the coreference chain of the
A common problem in extractive summaries is the occur- source-text is analyzed. Suppose the extractive summary
rence of referential expressions which are of difficult interpre- results in:“Guerra has cited the plant micro-propagation as
tation. In this paper we propose and evaluate a system for an example of low cost biotechnology” The reader cannot
summary post-edition, which aims at replacing referential identify the reference for the expression “Guerra”. If the
expressions, trying to avoid problems of broken referential coreference chain is considered, the expression “Guerra” can
linkage. The system is based on a hybrid approach: it em- be replaced by “the agronomist Miguel Guerra, from Uni-
ploys methods from deep approaches to verify and recover versidade Federal de Santa Catarina”. In this work we show
quality of summaries which were first generated by a superfi- experiments where the linguistic knowledge is given both by
cial approach. CorrefSum is thus a system developed to ver- a corpus of manually annotated coreference chains and by
ify the referential cohesion of the extractive summaries using an automatic coreference resolution system [15].
knowledge about coreference chains. This paper presents
some experiments on the basis of this tool. This paper has 3. SYSTEM ARCHITECTURE
the following structure: in Section 2, we present concepts The goal of the developed system is to treat problems
related to referential cohesion and coreference. Section 3 of referential cohesion found in extractive summaries, using
describes the CorrefSum tool. Section 4 presents experi- knowledge about coreference chains from the source text to
replace noun phrases whose referential interpretation may be
difficult. The system was implemented in Java1 , allowing
operating systems portability and free distribution. The
Permission to make digital or hard copies of all or part of this work for development and evaluation of this tool was based on the
personal or classroom use is granted without fee provided that copies are
Summ-it corpus [5] composed by 50 newspaper texts from
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to Folha de São Paulo, science section, written in Brazilian Por-
republish, to post on servers or to redistribute to lists, requires prior specific tuguese. The corpus has been processed by the PALAVRAS
permission and/or a fee. parser [3]. This parser encodes a Constraint Grammar that
DocEng’08, September 16–19, 2008, São Paulo, Brazil. 1
Copyright 2008 ACM 978-1-60558-081-4/08/09 ...$5.00. http://java.sun.com/

253
allows to encodes all annotation information as word based occurred in a summary alone was 4, there were also cases in
tags. The corpus has been manually annotated with corefe- which no replacements were necessary, however, there were
rence information using the MMAX tool [12]. Each text has replacements in most summaries. The average compression
been annotated and reviewed by 2 annotators using the same rate for the original summaries were 25.30%/23.14% and the
annotation reference manual.Experiments were undertaken summaries recovered after the application of CorrefSum sys-
with summaries produced by the Portuguese summarizers tem got an average compression rate of 28.36%/25.52%, an
GistSumm [13] and SuPor-2 [9]. GistSumm is a generic ex- increase of about 3% when compared to their original size,
tractive summarizer. SuPor-2 is an extractive summarizer due to replacements of less informative referential expres-
developed on the basis of machine learning algorithms. We sions by more complete ones. Although there is an increase
have chosen these summarizers since they were available and in the compression rate, it is still below the aimed 30%.
ready to use, they were easily applied to the texts from our
corpus. These systems were evaluated in previous work [10] 4.1 Quantitative Evaluation
and [2]. Table 1 shows the Rouge measures for the original sum-
The CorrefSum system has the following modules: file rea- maries generated by GistSumm and SuPor-2 against the re-
ding, reference score processing and summaries reviewing. covered summaries by CorrefSum.ROUGE [11] stands for
The goal of the reading file module is to read and store Recall-Oriented Understudy for Gisting Evaluation. It in-
information about the summaries and the source text. cludes measures to automatically determine the quality of a
The reference score processing module is designed to search summary by comparing it to other (ideal) summaries created
in the source text for the sentences included in the summary, by humans. The measures count the number of overlapping
select all coreference chains related to noun phrases which units such as n-gram, word sequences, and word pairs bet-
are present in the summary and apply a scoring scheme for ween the computer-generated summary to be evaluated and
the coreference chain elements. The scores are based on the the ideal summaries created by humans. Rouge-N are based
following criteria, the presence of the criteria adds one point on N-gram co-occurrences between reference and candidate.
to the score of the noun phrase: High order Rouge-N with n-gram length greater than 1 es-
Proper Name: if the noun phrase contains any proper timates the fluency of summaries. Rouge-L is based on the
name. longest common subsequence given two sequences X and Y
Size: if the noun phrase is the longest one in the chain, (a longest common subsequence of X and Y is a common
considering character length. subsequence with maximum lenght.) We used summaries
First: if the noun phrase is the first element in its chain. manually generated by professional summarizers [4].
Apposition: if the noun phrase contains commas (generally
Table 1: Rouge
used as an apposition mark). GistSumm SuPor-2
All chain elements are scored on the basis of the above Original Recovered Original Recovered
features. The points are cumulative. They are the selec- Rouge-1 0.49249 0.52275 0.56085 0.56496
Rouge-2 0.32250 0.34320 0.42391 0.41735
tion criteria for replacement of referring expressions in the Rouge-3 0.28382 0.29699 0.38510 0.36811
original summary. The summary recovery module uses the Rouge-4 0.26615 0.27218 0.36196 0.33566
scoring scheme to select the most complete chain element for Rouge-L 0.45497 0.47911 0.53368 0.53639
replacing the original summary term. If there are previous We noticed that by applying CorrefSum, the F-measure
sentences in the summary which contain referring expres- increased in most of the cases specially for GistSumm. The
sions common to those that appear in a coreference chain replacements performed with the goal of recovering the refe-
under investigation, the expressions are not replaced. This rential cohesion in the summaries indicate, according to this
module is also responsible for maintaining the compression measure, an improved informativity.
rate as configured by the user. If the recovered summary
exceeds the rate, and there is an apposition in the noun 4.2 Subjective Evaluation
phrase to be replaced, then only the first part is selected.
This kind of evaluation is difficult to perform because it
Also parenthesis may be disconsidered when the compres-
is based on human judgement, and it is often highly time
sion rate is exceeded.
consuming. Due to these difficulties we made a subjective
evaluation only on a part of our corpus, 10 texts were chosen,
4. EXPERIMENTS AND EVALUATION corresponding to 20% of the corpus. Since we wanted to con-
In this section we present experiments with the CorrefSum firm whether a difference in Rouge is actually perceived as
system on the basis of Summ-it corpus. The recovered sum- informativity improvement by the human readers, the texts
maries generated by CorrefSum were evaluated automati- were those whose F-measure (Rouge) presented greater dif-
cally (Rouge), subjectively (human judges) and qualitatively ferences before and after recovery. Five judges recived two
(author’s analysis). For the subjective evaluation, we have versions of the summaries: Original (O) and Recovered (R),
used 5 human judges, native speakers of Portuguese. The they were asked to indicate which one was more informa-
judges filled in a questionnaire about the extractive sum- tive or else if Both (B) were considered equally informative.
maries both recovered and non-recovered, aiming at eva- Informativity was briefly defined as “Informativity is what
luating informativity. On average, the source-texts convey determines the information level of each summary”. So the
11.72 coreference chains whereas summaries contain 6.60. judges were working on their intuitive understanding of the
For GistSumm summaries has performed a total of 89 re- concept. Table 2 brings the subjective evaluation results on
placements, with 1.78 on average per text. Out of the 330 informativity. In informativity evaluation, we have noticed a
chains (found in 50 summaries), 241 did not need replace- common agreement that the recovered summaries are more
ment. For SuPor-2 summaries we had 75 replacements (1.5 informative. This evaluation confirms the results indicated
average per text). The largest number of replacements that by the Rouge measure. The recovered summaries are prefe-

254
red by the human judges also for SuPor-2, even through the the name “Allen”. Dealing with internal expressions in the
Rouge differences were not so evident for this summarizer. noun phrase is another aspect in which this work can be
Only judge n. 4 does not agree with the others, pointing that perfected, for example, the substitution of “The minister ”
both summaries (original and recovered) are informative. by the more complete noun phrase “The Minister of Justice
in the country, Elisabeth Guigou” has created a referential
cohesion problem, in which one cannot identify to which
Table 2: Subjective Evaluation on Informativity country the expression “the country” refers.
GistSumm SuPor-2
Judge
Judge 1
O
1
R
9 0
B O
1
R
8
B
1
4.4 Evaluation of CorrefSum with a
Judge 2 1 9 0 2 7 1 coreference resolution system
Judge 3 1 9 0 2 7 1 The coreference resolution system considered in this work
Judge 4 2 7 1 0 2 8
Judge 5 1 8 1 0 9 1 calls ACRoPos (Automatic Coreference ResOlution system
for POrtugueSe) [15]. The process consists in first identify-
ing pairs of expressions that are anaphoric and then grouping
4.3 Qualitative Evaluation the pairs that relate to each other thus forming the corefer-
In this section, we discuss the qualitative evaluation of the ence chains. The solution implemented uses machine lear-
substitutions performed by the system, based on an analy- ning to infer a classifier which receives as input noun phrases
sis conducted by the author. The analyzed replacements are pairs and their features. It is a first proposal of such a sys-
the 89 substitutions performed by the CorrefSum system in tem for the Portuguese Language, which considers all noun
the GistSumm summaries. To carry out this analysis the phrases types. The evaluation reported for the system is
following scoring criteria were adopted: of about Precision 97.14%, Recall 42.11% and F-measure
1 point: irrelevant replacement; for example, the replace- 57.96%. Here we present Rouge numbers for the original
ment of “human skin” by “regular human skin”. summaries compared against the recovered GistSumm sum-
2 points: to the replacement that, somehow, has helped in maries on the basis of our two coreference knowledge sources,
the summary comprehension, by adding extra information; first for the manual anotation corpus and then automatic
for example, the replacement of “Science and Technology” anotation. As the system has been trained on the same cor-
by “The Ministry of Science and Technology (MST)”. pus, we consider here the results regarding only the 20 texts
3 points: to the replacement that meaningfully contributes of the test set to avoid biased results. As expected, the
to the summary comprehension, and that solved a referential manual gold standard produces always a greater impact in
cohesion mistake; for example, the replacement of “Guerra” Rouge. We can see, however, that the coreference resolution
by “the agronomist Miguel Guerra, from UFSC (Universi- system produce chains that seem useful for summary reco-
dade Federal de Santa Catarina)”. very. Considering that this coreference resolution can still
be improved, we consider this a promising line of research.
Table 3: Qualitative Analysis Table 4: Rouge - ACRoPos and CorrefSum
Total Average Original Recovered Recovered
1-Point 36 40.45% Manual Annot. Autom. Annot.
2-Points 17 19.10% Rouge-1 0.51306 0.52735 0.52528
Rouge-2 0.34727 0.33950 0.33690
3-Points 36 40.45% Rouge-3 0.30840 0.29155 0.28954
Rouge-4 0.28881 0.26496 0.26420
Table 3 shows the analysis result. About 60% were mean- Rouge-L 0.47745 0.48163 0.47690
ingful replacements, whereas 40% did not have a major im-
pact on the summary informativity. Generally, in 1-point
replacements, the replaced noun phrase do not have any 5. RELATED WORK
anaphoricity feature, they are complete expressions whose Azzam et al [1] describe the use of coreference chains for
meaning is self-contained, and their interpretations are, some- summary production. Differently from us, Azzam’s paper
how, context-independent. We believe the system can still makes use of coreference chain information to select those
be improved with the development of an anaphoricity classi- sentences which are going to be part of the summary, consi-
fication module, in which the substitution will only be per- dering that the “best” chain refers to the most relevant topic
formed if the system identifies a need for it. in the text. Kashani et al [8] reports an algorithm for pro-
Other interesting cases refer to the substitutions in which noun generation in automatic summaries, aiming at avoiding
the expression chosen for replacement presented a referential repetition. While Kashani intends to prevent repetition by
cohesion problem in an internal noun phrase. For example: the inclusion of pronouns, our work tries to avoid the occur-
The minister → The Minister of Justice of the country, Elis- rence of an unresolved pronoun (that is, without a proper
abeth Guigou antecedent) by substituting such a pronoun for a complete
This insect → a being that breaks into bodies and controls expression, if necessary. Steinberger et al [16] propose two
other people’s minds, forcing them to do what he orders ways for using coreference in summarization. The first one
This study → Allen and his colleagues’ work is summary generation by exploring lexical information and
The noun phrase “a being that breaks into bodies and con- coreference resolution. Experiments comparing summaries
trols other people’s minds, forcing them to do what he or- generated by using lexical information only and by making
ders” turns out to be a problem, once it is impossible to use of both lexical information and anaphoric resolution
identify that the expression “being” refers to an “insect” . show an increase in Rouge-1 from 53,14% to 59,27%. The
Another problem is in the expression “Allen and his col- second approach, similarly to our proposal, verifies the sum-
leagues’ work ”, in which we cannot identify the referent to mary with the goal of recover referring expression. For the

255
coreference resolution, the GUITAR system [14] has been [5] Sandra Collovini, Thiago Carbonel, Juliana Thielsen
used. The basic strategy is the change of anaphoric nouns Fuchs, Jorge César Coelho, Lucia Rino, and Renata
for the first element in the coreference chain. As an alter- Vieira. Summit: Um corpus anotado com informações
native, our work proposes a scoring scheme for choosing the discursivas visando à sumarização automática. In 5ž
substitute, as described in Section 3. In [6] we presented a Workshop em Tecnologia da Informação e da
first preliminary evaluation based only on Rouge-1 measure. Linguagem Humana (TIL’2007), Rio de Janeiro, RJ,
Here we present a more detailed evaluation considering se- 2007. Proceedings of the SBC.
veral Rouge measures as well as an subjetive and qualitative [6] Patricia Nunes Gonçalves, Renata Vieira, , and Lucia
evaluation. We also evaluate the integration of our system Rino. Correfsum: Referencial cohesion recovery in
with a coreference resolution module. extractive summaries. In 8th Workshop on
Computational Processing of Written and Spoken
6. CONCLUSIONS Language (PROPOR’2008), Aveiro, Portugal, 2008.
Some of the most common problems in extractive sum- Springer.
maries is the occurrence of referential expression which are [7] M.A.K. Halliday and R. Hasan. Cohesion in english.
of difficult interpretation. This informational gap, can often Londres: Longman, 1976.
cause misunderstandings. In this paper we have proposed [8] Mehdi M. Kashani and Fred Popowich. Pronoun
and evaluated a system for automatic summary post-edition, generation for text summarization and question
which aims at rewriting referential expressions in the most answering. In Proceedings of 5th Slovenian and 1st
coherent possible way, trying to avoid problems of referen- international Language Technologies Conference 2006,
tial linkage. In order to achieve this, the noun phrases in the 2006.
summaries are analyzed on the basis of coreference chains, [9] Daniel Leite and Lucia Rino. Supor: extensões e
with the goal of identifying expressions which best repre- acoplamento a um ambiente para mineração de dados.
sent the evoked entity and performing the corresponding Technical report, Departamento de Computação,
substitution when appropriate. The experiments conducted Universidade Federal de São Carlos. São Carlos-SP.
in this paper considered two summarizers previously evalu- NILC-TR-06-07, 2006.
ated for Portuguese: GistSumm and SuPor-2. The results [10] Daniel Leite, Lucia Rino, Thiago Pardo, and Maria da
show an increase in F-measure for both experiments. The Graça Volpe Nunes. Extractive automatic
linguistic knowledge is given both by a corpus of manually summarization: Does more linguistic knowledge make
annotated coreference chains and by a coreference resolu- a difference? In C. Biemann, I. Matveeva, R.
tion system [15]. In the subjective evaluation, most judges Mihalcea, and D. Radev (eds.), Rochester, NY, USA,
considered the recovered summaries more informative in re- 2007. Proceedings of the HLT/NAACL Workshop on
lation to both summarizers. Also, in order to improve the TextGraphs-2: Graph-Based Algorithms for Natural
performance of the system, we intend to consider the classifi- Language Processing.
cation of anaphoric expressions to verify the need for substi- [11] Chin-Yew Lin. Rouge: A package for automatic
tution, to solve the referential cohesion problems in internal evaluation of summaries. In Workshop on Automatic
noun phrase, and also to generate alternative referential ex- Summarization, Philadelphia,USA, 2000. Proceedings
pressions based on the coreference chains, instead of just of ACL-02.
replacing them. A further step in this research is to build [12] Christoph Müller and Michael Strube. Mmax: A tool
and evaluate automatic summarizers which take into con- for the annotation of multi-modal corpora. In
sideration the coreference chains in the choice of relevant Proceedings of the 2nd IJCAI Workshop on Knowledge
sentences. and Reasoning in Practical Dialogue Systems, pages
7. REFERENCES 45–50, Seattle, Washington, 2001.
[1] Saliha Azzam, Kevin Humphreys, and Robert [13] T.A.S. Pardo. Gistsumm - gist summarizer: Extensões
Gaizauskas. Using coreference chains for text e novas funcionalidades. Technical report,
summarization. In Proceedings of The Relation of NILC-TR-05-05. São Carlos-SP, 2005.
Discourse/Dialogue Structure and Reference, 1999. [14] Massimo Poesio and Mijail A. Kabadjov. A
[2] Pedro Paulo Balage Filho, Thiago Alexandre Salgueiro general-purpose,off the shelf anaphoric resolver. In
Pardo, and Maria das Graças Volpe Nunes. Proceedings of International Conference on Language
Sumarização automática de textos cientı́ficos: Estudo Resources and Evaluation, Lisboa, Portugual, 2004.
de caso com o sistema gistsumm. Technical report, [15] José Guilherme Souza, Patricia Nunes Gonçalves, and
Departamento de Computação, Universidade Federal Renata Vieira. Automatic coreference resolution
de São Carlos. São Carlos-SP. NILC-TR-07-11, 2007. applied to portuguese. In 8th Workshop on
[3] Eckhard Bick. The Parsing System ”PALAVRAS” - Computational Processing of Written and Spoken
Automatic Grammatical Analysis of Portuguese in a Language (PROPOR’2008), Aveiro, Portugal, 2008.
Constraint Grammar Framework. PhD thesis, Springer.
Department of Linguistics, University of Århus, DK., [16] Josef Steinberger, Massimo Poesio, Mijail Kabadjov,
2000. and Karel Jezek. Two uses of anaphora resolution in
[4] Jorge César Barbosa Coelho. Uso de informação de summarization. In Information Processing and
correferência e anáfora para verificação da coesão e Management, 2007.
coerência textual na sumarização automática.
Trabalho de Conclusão de Curso de Letras. Unisinos -
São Leopoldo, Junho 2007.

256

You might also like