Dec. 03, 2019

With the huge increase of on-line textual data, the need arises for an automated method for extracting a summary from a text file, such as an article or an interview, for further processing. This, combined with ever shorter available time to evaluate the vast amount of published text, raise the need for an automated methodology for summary extraction from written texts. Most available solutions are language-dependent and require training the algorithms on large volumes of text. Now BGN Technologies, BGU's technology transfer company, introduces a novel, automated and language-independent tool for summarizing text. The method is applicable for extraction of articles, magazines and databases within the media itself and by users of such media including libraries, academic research engines and general search engines.
Last 1.jpg

The novel technology, invented by Prof. Mark Last (pictured above), Dr. Marina Litvak, and Dr. Menahem Friedman at BGU's Department of Software and Information Systems Engineering, provides language-independent summaries of texts, based on a genetic algorithm that ranks document sentences, using statistical sentence features, which can be calculated for sentences in any language, and then extracts top–ranking sentences into a summary. The method, called MUSE – Multilingual Sentence Extractor, was tested on nine languages: English, Hebrew, Arabic, Persian, Russian, Chinese, German, French, and Spanish, and its summarization quality was evaluated on four languages: English, Hebrew, Arabic and Persian showing a high level of similarity to human-generated summaries.

Experimental results show that after initial training of the algorithms on an annotated corpus of summarized documents, where each document is accompanied by several human-generated summaries, the software does not need to be retrained on a summarization corpus in each new language, and the same sentence-ranking model can be used across several languages.

Prof. Mark Last, said, "Extractive summarization, which selects a subset of the most relevant sentences from a source text, via ranking them by a relevance score and selecting the top-ranking sentences into a summary, is invaluable for being able to quickly summarize large quantities of text in a language-independent manner. This ability is crucial for search engines as well as other end-users, such as researchers, libraries and the media."

Zafrir Levy, Senior VP Business Development, BGN Technologies, added, "This tool will be a valuable addition to our ability to benefit from the vast amounts of text available online. After filing a patent to protect the technology, we are currently looking for potential partners for further development and commercialization of this promising invention."

Media Coverage:
JPost
The Times of Israel
Breaking Israel News​