Large-scale storage systems lie at the heart of the big data revolution. As these systems grow in scale and capacity, their complexity grows accordingly, building on new storage media, hybrid memory hierarchies, and distributed architectures. Numerous layers of abstraction hide this complexity from the applications, but also hide valuable information that could improve the system's performance considerably.
I will demonstrate how to bridge this semantic gap in the context of erasure codes, which are used to guarantee data availability and durability. Current theoretical research efforts focus on codes that will reduce the storage, network, and compute overheads of the systems that use them, without sacrificing their reliability. However, the semantic gap makes it difficult to observe the theoretical benefit of the resulting codes in real implementations. I will follow the example of regeneration and locally recoverable codes, showing the key challenges in applying optimal erasure codes to real systems, and how they can be addressed. This part is based on joint work with Matan Liram, Oleg Kolosov, Eitan Yaakobi, Itzhak Tamo and Alexander Barg.
I will then briefly describe the challenges introduced by the semantic gap in other layers of the "storage stack", and my experience in addressing them. I will refer to the memory hierarchy, flash-based solid-state drives, workload analysis, and aspects of data security.
Gala Yadgar is a senior researcher in the Computer Science Department at the Technion, where she received her Ph.D in 2012, and a researcher in the Department of Systems in the School of Electrical Engineering of Tel Aviv University. She is an associate editor of ACM Transactions on Storage, and serves on the program committees of SYSTOR, MSST, FAST, and USENIX ATC. Her research is directed at methods for improving performance and reliability of storage in large scale data centers, focusing on complex hierarchies and enhanced storage interfaces.