TY - JOUR
T1 - Learning from crowds with variational Gaussian processes
AU - Ruiz, Pablo
AU - Morales-Álvarez, Pablo
AU - Molina, Rafael
AU - Katsaggelos, Aggelos K.
N1 - Funding Information:
Pablo Morales-Álvarez received the B.Sc. degree in mathematics and the M.Sc. degrees in mathematical physics and data science from the University of Granada, Granada, Spain, in 2014, 2015, and 2016, respectively, where he is currently pursuing the Ph.D. degree with the Department of Computer Science and Artificial Intelligence under the supervision of Prof. R. Molina, funded by a Ph.D. Fellowship from the La Caixa Foundation. His research interests include probabilistic machine learning methods, especially Gaussian processes, and their application to image processing and classification problems.
Funding Information:
This work was supported by the Spanish Ministry of Economy and Competitiveness under project DPI2016-77869-C2-2-R, the US Department of Energy (DE-NA0002520) and the Visiting Scholar Program at the University of Granada. PMA received financial support through La Caixa Fellowship for Doctoral Studies (La Caixa Banking Foundation, Barcelona, Spain).
Publisher Copyright:
© 2018 Elsevier Ltd
PY - 2019/4
Y1 - 2019/4
N2 - Solving a supervised learning problem requires to label a training set. This task is traditionally performed by an expert, who provides a label for each sample. The proliferation of social web services (e.g., Amazon Mechanical Turk) has introduced an alternative crowdsourcing approach. Anybody with a computer can register in one of these services and label, either partially or completely, a dataset. The effort of labeling is then shared between a great number of annotators. However, this approach introduces scientifically challenging problems such as combining the unknown expertise of the annotators, handling disagreements on the annotated samples, or detecting the existence of spammer and adversarial annotators. All these problems require probabilistic sound solutions which go beyond the naive use of majority voting plus classical classification methods. In this work we introduce a new crowdsourcing model and inference procedure which trains a Gaussian Process classifier using the noisy labels provided by the annotators. Variational Bayes inference is used to estimate all unknowns. The proposed model can predict the class of new samples and assess the expertise of the involved annotators. Moreover, the Bayesian treatment allows for a solid uncertainty quantification. Since when predicting the class of a new sample we might have access to some annotations for it, we also show how our method can naturally incorporate this additional information. A comprehensive experimental section evaluates the proposed method with synthetic and real experiments, showing that it consistently outperforms other state-of-the-art crowdsourcing approaches.
AB - Solving a supervised learning problem requires to label a training set. This task is traditionally performed by an expert, who provides a label for each sample. The proliferation of social web services (e.g., Amazon Mechanical Turk) has introduced an alternative crowdsourcing approach. Anybody with a computer can register in one of these services and label, either partially or completely, a dataset. The effort of labeling is then shared between a great number of annotators. However, this approach introduces scientifically challenging problems such as combining the unknown expertise of the annotators, handling disagreements on the annotated samples, or detecting the existence of spammer and adversarial annotators. All these problems require probabilistic sound solutions which go beyond the naive use of majority voting plus classical classification methods. In this work we introduce a new crowdsourcing model and inference procedure which trains a Gaussian Process classifier using the noisy labels provided by the annotators. Variational Bayes inference is used to estimate all unknowns. The proposed model can predict the class of new samples and assess the expertise of the involved annotators. Moreover, the Bayesian treatment allows for a solid uncertainty quantification. Since when predicting the class of a new sample we might have access to some annotations for it, we also show how our method can naturally incorporate this additional information. A comprehensive experimental section evaluates the proposed method with synthetic and real experiments, showing that it consistently outperforms other state-of-the-art crowdsourcing approaches.
KW - Bayesian modeling
KW - Classification
KW - Crowdsourcing
KW - Gaussian processes
KW - Variational inference
UR - http://www.scopus.com/inward/record.url?scp=85057800782&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057800782&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2018.11.021
DO - 10.1016/j.patcog.2018.11.021
M3 - Article
AN - SCOPUS:85057800782
SN - 0031-3203
VL - 88
SP - 298
EP - 311
JO - Pattern Recognition
JF - Pattern Recognition
ER -