The paper "Exact Lower Bounds for the Agnostic Probably-Approximately-Correct (PAC) Machine Learning Model" co-authored by Losif Pinelis (Michigan Tech University) and Aryeh Kontorovich (BGU CS) has been accepted for publication in the prestigious journal Annals of Statistics.
While the topic of the paper falls under theoretical machine learning, it also has potential real-world applications related to crowdsourcing. The latter is often used in the industry where companies deploy it as as a popular mechanism for obtaining manually-annotated big data sets (e.g., asking people to label the content of images). This, in turn, is a key component in training state-of-the-art artificial intelligence systems such as those based on deep learning. However, one complication is that the human experts (in effect, the people labeling the data) do not necessarily agree with each other. By providing the most precise risk lower bounds possible, the authors show how to calculate the absolute least amount of data a learning algorithm must obtain in order to achieve a desired accuracy level.