Oct. 19, 2021

Building 96, Room 001


Speaker: Roman Vainshtein

Title: Representation of Datasets and Machine Learning Pipelines for Metalearning and AutoML Optimization 


The construction of complex empirical machine learning models is largely a manual process, requiring a team of data scientists and experts in the subject matter. The opportunity exists to build models to speed up scientific discovery, enhance corporate's intelligence, improve logistics and workforce management, etc., but capitalizing on this opportunity is fundamentally limited by the availability of data scientists. The widespread use of machine learning algorithms and the high level of expertise required to utilize them have fueled the demand for solutions that can be used by non-experts. There is a clear need for automated model discovery systems that enable users with subject matter expertise, but no data science background, to create empirical models of real and complex processes. One of the challenges of automated machine learning applications is the automatic selection of a machine learning pipeline and model for a given problem.

The main objective of our research is to develop a novel and resource-efficient approach for machine learning model/pipeline selection. To achieve this objective, our approach combines several sources of information: meta-features extracted from the data itself, word-embedding features extracted from a large corpus of academic publications, graphical dataset representation (and analysis), and information (meta-features) regarding the candidate pipeline solutions.

In our experiments we demonstrated the effectiveness of our proposed approach on a large number of datasets. Extensive evaluation on hundreds of datasets, both in regression and classification tasks, shows that our approach either outperforms or is comparable to popular top-performing, computationally heavy approaches.

Roman Vainshtein.jpg


Roman Vainshteinis a doctoral student in the department of Software and Information Systems Engineering, at Ben-Gurion University of the Negev(BGU), Beer-Sheva, Israel. Studying under the supervision of Prof. Bracha Shapira and Prof. Lior Rokach. Roman received his B.Sc., M.Sc. degrees in Software and Information Systems Engineering from BGU. His research interests include various topics from machine learning and automated machine learning (AutoML). The main topic of his PhD research is AutoML and meta-learning. The objective of the research is to develop novel and resource efficient approaches for machine learning model/pipeline generation or selection.