The Data Science seminars are taking place on Wednesdays between 13:00-14:00 in the seminars room of Building 96 (unless announced otherwise)

When: Wednesday, August 21, 2019 13:00 -14:00   96/001

Speaker: Abraham Itzhak Weinberg, Ben-Gurion University of the Negev
Title: Interpretable Decision-Tree Induction and Streaming Algorithms Synergy in Big Data Parallel Framework
The talk will be divided into two parts. The first part will focus on Interpretable Decision-Tree Induction in Big Data Parallel Framework. In this part, we present and evaluate several methods for choosing one representative model out of an ensemble of decision-tree models induced from multiple subsets of a big training dataset using a parallel, distributed framework, such as MAPREDUCE. The proposed methods compute the similarity between different models and choose the model, which is most similar to others as the best representative of the entire dataset. The similarity-based approach is implemented with three different similarity metrics: a syntactic, a semantic, and a linear combination of the two. We compare this tree selection methodology to a popular ensemble algorithm (majority voting) and the baseline of randomly choosing one of the local models. In addition, we evaluate two alternative tree selection strategies: choosing the tree having the highest validation accuracy and reducing the original ensemble to five most representative trees. The syntactic similarity approach, named SySM -- Syntactic Similarity Method, provides a significantly higher testing accuracy than the semantic and the combined ones. Compared to ensemble algorithms, the representative models selected by the proposed methods are more compact and interpretable along with providing a higher inference speed for new instances.
In the second part, we present EnHAT - Ensemble Combined with Hoeffding Adaptive Tree. In this part, we demonstrate a synergy between two popular data streaming algorithms: HAT and ensemble. According to our experiments, the combination between the two models yields a better classification performance than each of the individual algorithms. This improvement does not come at the expense of additional computational resources.

About the speaker:
Avi has spent over 25 years in the fields of software and information systems. He served for six years in IAF (Israeli Air Force) and retired as a captain. Recently, he has managed BI (Business Intelligence) unit, data warehouse projects, and consulted data science projects as well as integration between big data and cybersecurity. His academic background consists of B.Sc. degrees in Industrial Engineering and Management as well as in Computer Science and M.Sc. degree in Industrial Engineering. Currently, in addition to working in cybersecurity industry, he is pursuing his Ph.D. studies at the Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Israel.



When: Wednesday, September 4, 2019 13:00 -14:00   96/001

Speaker: Aviad Cohen, Ben-Gurion University of the Negev

Title:  A Triple-Layered Machine Learning-Based Methodology for Enhancing the Security of Email Eco-system

In this research, we introduce a triple-layered machine learning based methodology for enhancing the security of email ecosystem. All three layers employ machine learning methods; each layer addresses the security of different level in the email eco-system. The first layer detects malicious non-executable files attached to emails. The second layer detects malicious emails by analyzing the entire email structure quickly and independent of any external resource. The third layer detects, in a trusted manner, whether the email server has been compromised by a malware. For each layer, we present a published journal paper, which propose a novel approach, and prove the feasibility and applicability of each layer of the methodology. 

About the speaker:

Aviad Cohen is a senior security researcher at the Malware-Lab, Cyber Security Research Center (CSRC) at Ben-Gurion University of the Negev. Aviad pursues his Ph.D. studies between 2015 and 2019 in BGU's Department of Software and Information Systems Engineering. His research is aimed at the development of a triple-layered machine learning-based methodology for enhancing the security of E-mail ecosystem. He is a co-author of several papers dealing with the analysis and detection of malicious non-executable files, malicious emails and compromised virtual machine. His main areas of interest are cyber security, machine learning and data science.