Conclusion

One of the main goals of this Studienprojekt was to find out, whether the extra-effort regarding linguistic sensitivity to noun phrase extraction and lexical chain building taking the referential structure of the text into consideration ist worth the trouble. Certainly, such an extra effort has a negative influence on the run-time behavior. And as our evaluation reveals, the summarization results you get are seldom worth it.

As it appears, there are a lot of factors that determine the quality of a summary produced by SumIt!. Among these are the text itself and the coverage of WordNet. Summarizing the iPod text, SumIt! gave a rather weak performance. We believe that this is mainly due to the fact "iPod" and the whole lexical-semantic field connected with it are no entries in WordNet. However, in the text with the Vatican's beatification policy, SumIt! turned out to just as good as its co-fighters, sometimes even better. This does not come as a big surpise: Since most discourse units relating to the Pope, the Vatic, and Rome are covered by WordNet, SumIt! can create very long lexical chains.

The text about the Pope's health problems may include discourse units covered by WordNet, but the text itself is rather uncommon. Basically, the text is one long chronological enumeration of papal illnesses. A thematic structure is de facto non-existent. Neither is there are a variety of discourse units. The text is mainly about the Pope, who is mentioned in nearly every sentence. A human summarizer can make a judgment about what disease or illness is the gravest of all. Most people chose the parkinson disease and the cancer tumor. One the one hand, they probably had background information on Pope John Paul II, on the other hand, they know about the nature of these diseases and their gravity. The machine text summarizer does not.

There are a lot of things that we could have done differently, or which still need improvement. Surprisingly, most human summarizers have included the heading of the text, whereas SumIt! treated the heading just like any other sentence. If it ranked low, it wouldn't be included. Most summaries for the gold standard show that the lower the summarization rate, the more useful it can be for achieving a coherent summary of the text. Maybe, we should have considered that and have equipped SumIt! with a heuristic for headline/title detection and a certain multiplication factor for the heading.

Another thing is still the handling of noun phrases. As the lexical chains produced by the pope texts show, 'Paul XI' will be referenced with 'John Paul II', since both include 'Paul'. This is definitely one the major faults of this hack. But in other cases, it has proven to be quite effective. And so did the hack with the genitive attributes. Our linguistic heuristics have very good intentions, but they are still meant to be robust and lack the necessary sensitivity. So, this would a very good starting point for further improvements.

Regarding the lexical chaining algorithm, there is no denying the fact that with long texts the runtime behavior will be unpredictable. The problem about calculating the complexity here is that it is quite difficult, determining what is N. Every text is different, and in our case the runtime behavior is not solely depending on the number of noun phrases a text has. It depends on the length of LOR, LLK, and the length of the extension of their included noun phrases that determines the runtime behavior. However, our testing has shown that it is especially the access of WordNet that has a negative effect on the runtime. If we only process a text using repetitions, the only thing that makes SumIt! so much slower than OTS or MS-Word is the tagger.

In the end, we must concede that our general approach hast turned out be more similar to that of Barzilay and Elhadad 1997 than we had wished to. Nonetheless, our implementation will be a usable tool available to anyone and is a lot more founded on referential semantics. But the latter could have been a problem. We spent too much time on the construction of the lexical chains, instead of working out a more clever approach on how to grade with them. After the evaluation, we have come up with some ideas on what multiplication factors and other grading hacks to use in order to get a summary closer to the gold standard. These ideas include the consideration of the heading/title. That we have completely ignored it, may have been our greatest flaw. But we are going to make up for it in future versions.