The method was tested on nine languages: English, Hebrew, Arabic, Persian, Russian, Chinese, German, French, and Spanish, and its summarization quality was evaluated on four languages: English, Hebrew, Arabic and Persian showing a high level of similarity to human-generated summaries
BEER-SHEVA, Israel – November 25, 2019 - With the huge increase of on-line textual data, the need arises for an automated method for extracting a summary from a text file, such as an article or an interview, for further processing. This, combined with ever shorter available time to evaluate the vast amount of published text, raise the need for an automated methodology for summary extraction from written texts. Most available solutions are language dependent and require training the algorithms on large volumes of text. Now BGN Technologies, the technology transfer company of Ben-Gurion University of the Negev, introduces a novel, automated and language-independent tool for summarizing text. The method is applicable for extraction of articles, magazines and databases within the media itself and by users of such media including libraries, academic research engines and general search engines.
The novel technology, invented by Prof. Mark Last, Dr. Marina Litvak, and Dr. Menahem Friedman at the Department of Software and Information Systems Engineering of Ben-Gurion University, provides language-independent summaries of texts, based on a genetic algorithm that ranks document sentences, using statistical sentence features, which can be calculated for sentences in any language, and then extracts top–ranking sentences into a summary. The method, called MUSE – Multilingual Sentence Extractor, was tested on nine languages: English, Hebrew, Arabic, Persian, Russian, Chinese, German, French, and Spanish, and its summarization quality was evaluated on four languages: English, Hebrew, Arabic and Persian showing a high level of similarity to human-generated summaries.
Experimental results show that after an initial training of the algorithms on an annotated corpus of summarized documents, where each document is accompanied by several human generated summaries, the software does not need to be retrained on a summarization corpus in each new language, and the same sentence-ranking model can be used across several languages.
Prof. Mark Last, said, "Extractive summarization, which selects a subset of the most relevant sentences from a source text, via ranking them by a relevance score and selecting the top-ranking sentences into a summary, is invaluable for being able to quickly summarize large quantities of text in a language-independent manner. This ability is crucial for search engines as well as other end-users, such as researchers, libraries and the media."
Zafrir Levy, Senior VP Business Development, BGN Technologies, added, "This tool will be a valuable addition to our ability to benefit from the vast amounts of text available online. After filing a patent to protect the technology, we are currently looking for potential partners for further development and commercialization of this promising invention."
About BGN Technologies
BGN Technologies is the technology company of Ben-Gurion University, Israel. The company brings technological innovations from the lab to the market and fosters research collaborations and entrepreneurship among researchers and students. To date, BGN Technologies has established over 100 startup companies in the fields of biotech, hi-tech, and cleantech as well as initiating leading technology hubs, incubators, and accelerators. Over the past decade, it has focused on creating long-term partnerships with multinational corporations such as Deutsche Telekom, Dell-EMC, IBM, PayPal, and Bayer, securing value and growth for Ben-Gurion University as well as for the Negev region. For more information, visit the BGN Technologies website.
Global Media Liaison, BGN Technologies