Speaker: Asaf Harari
Title: Automatic features engineering from external information sources
Data scientists apply machine learning algorithms to analyze vast amounts of data, but the datasets themselves are often not fully compatible with such algorithms. This difficulty may stem from poor representation of the data, or the lack of relevant contextual information. To overcome these limitations, data scientists often employ Feature Engineering (FE) techniques, which entail the manipulations of existing features in the original dataset in order to create additional ones. The generated features often provide additional perspective to learning algorithms, enabling them to better analyze the augmented dataset. In addition to analyzing the dataset itself, human experts often rely on additional (external) information to augment their datasets. Extracting features from external sources can be beneficial in cases where useful information is limited or simply does not exist in the original data.
While multiple automation solutions have been proposed in recent years, the vast majority focuses on extracting features from the analyzed dataset itself and not from external sources.
In this talk I'll present two frameworks for automatic feature engineering from external sources. The first framework automatically matches the entities in the analyzed dataset to those of the external data source, and then proceeds to generate a large and diverse set of candidate features, both from structured and unstructured content. A meta learning-based ranking approach been used to efficiently process the large number of generated features.
The second framework uses BERT-based architecture to automatically generate high quality features from free text. The framework uses a novel fine-tuning strategy. It's reformulate the analyzed dataset as sentence classification task, approach that enable fine-tuning BERT on pre-analyzed datasets, and the analyzed dataset itself. Both frameworks provided SOTA performance.
Asaf is a Ph.D. student at the Software and Information Systems Engineering Department. He received a BSc and MSc in the Industrial Engineering department in 2015 and 2018. He also serves as a data scientist in the cyber labs of Ben-Gurion University (CBG). Asaf's main interests are in the fields of features engineering, knowledge discovery and NLP.