$$Events$$

Aug. 21, 2019
13:00
-14:00

96-001

Speaker: Abraham Itzhak Weinberg, Ben-Gurion University of the Negev

Title:  Interpretable Decision-Tree Induction and Streaming Algorithms Synergy in Big Data Parallel Framework

Abstract: 

The talk will be divided into two parts. The first part will focus on Interpretable Decision-Tree Induction in Big Data Parallel Framework. In this part, we present and evaluate several methods for choosing one representative model out of an ensemble of decision-tree models induced from multiple subsets of a big training dataset using a parallel, distributed framework, such as MAPREDUCE. The proposed methods compute the similarity between different models and choose the model, which is most similar to others as the best representative of the entire dataset. The similarity-based approach is implemented with three different similarity metrics: a syntactic, a semantic, and a linear combination of the two. We compare this tree selection methodology to a popular ensemble algorithm (majority voting) and the baseline of randomly choosing one of the local models. In addition, we evaluate two alternative tree selection strategies: choosing the tree having the highest validation accuracy and reducing the original ensemble to five most representative trees. The syntactic similarity approach, named SySM -- Syntactic Similarity Method, provides a significantly higher testing accuracy than the semantic and the combined ones. Compared to ensemble algorithms, the representative models selected by the proposed methods are more compact and interpretable along with providing a higher inference speed for new instances.
In the second part, we present EnHAT - Ensemble Combined with Hoeffding Adaptive Tree. In this part, we demonstrate a synergy between two popular data streaming algorithms: HAT and ensemble. According to our experiments, the combination between the two models yields a better classification performance than each of the individual algorithms. This improvement does not come at the expense of additional computational resources.

About the speaker:

Avi has spent over 25 years in the fields of software and information systems. He served for six years in IAF (Israeli Air Force) and retired as a captain. Recently, he has managed BI (Business Intelligence) unit, data warehouse projects, and consulted data science projects as well as integration between big data and cybersecurity. His academic background consists of B.Sc. degrees in Industrial Engineering and Management as well as in Computer Science and M.Sc. degree in Industrial Engineering. Currently, in addition to working in cybersecurity industry, he is pursuing his Ph.D. studies at the Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Israel.