Oct. 26, 2021



Speaker:  Prof. Valia Kordoni Humboldt-Universität zu Berlin

Title: Benchmarking in Natural Language Processing 


The plight of benchmark-driven Natural Language Processing (NLP) research has prompted widespread concern about the assumptions underlying standard benchmarks and widespread interest in alternative models of evaluation.

Collecting examples on which current models fail is neither necessary nor sufficient to create a useful benchmark. This approach can create a counterproductive incentive for researchers to develop models that are different without being better, since a model can top the leaderboard either by producing fewer errors or by simply producing different errors.


In this talk, I will focus on the concerns about standard benchmarks that motivate alternative methods, debating whether these are justified. I will also look into criteria which adequate benchmarks should satisfy. To this effect, I will also draw on the knowledge gathered from our recent ACL

2021 Workshop on “Benchmarking: Past, Present and Future


https://aclanthology.org/events/acl-2021/#2021-bppf-1​, in which together with Ken Church and Mark Liberman we had an amazing collection of invited speakers who shared with us first hand knowledge of how benchmarking became important in Information Retrieval, and then in speech (starting around 1975), and then in language (in 1988).


In the last part of this talk, I will briefly talk about Benchmarking in NLP for Medicine. Nowadays there are Language and Deep Learning models trained or fine tuned using biomedical domain specific corpora. Few works, though, exist which have benchmarked the performance of such models with respect to the state-of-the-art Machine Learning models and prognostic scoring systems on publicly available medical datasets. I will briefly survey some promising research directions based on hybrid data collection protocols, involving larger-scale data validation, too.



PD Dr. Valia Kordoni (Humboldt-Universität zu Berlin, Germany) is a faculty member of the Department of English at the Humboldt-Universität zu Berlin (Germany). She conducts research on Language Technology (LT), Data Science and Artificial Intelligence (AI), focusing on Robust Natural Language Analytics, Computational Semantics, Discourse and Human Cognition Modeling, as well as Machine Learning for the automated acquisition of knowledge. She has been the president of the ACL (Association for Computational Linguistics) SIGLEX's (Special Interest Group on Lexicon) MWE (Multiword Expressions) Group. She was the Local Chair of ACL 2016 - The 54th Annual Meeting of the Association for Computational Linguistics.

She has coordinated and contributed to many projects funded by the EU, the DFG (Germany), the BMBF (Germany), the DAAD (Germany), as well as the NSF (USA), among the latest of those being "TraMOOC: Translation for Massive Open Online Courses", a EU-funded Horizon 2020 collaborative project aiming at providing reliable Neural Machine Translation for Massive Open Online Courses (MOOCs), as well as her current project on "Metaphor and Metonymy as Register Phenomena" (part of the Collaborative Research Centre

1412 "Register" funded by the DFG at the Humboldt-Universität zu Berlin, Germany).