TY - GEN
T1 - Don't Say What You Don't Know
T2 - 2nd Workshop on Natural Language Generation, Evaluation, and Metrics, GEM 2022, as part of EMNLP 2022
AU - King, Daniel
AU - Shen, Zejiang
AU - Subramani, Nishant
AU - Weld, Daniel S.
AU - Beltagy, Iz
AU - Downey, Doug
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - Abstractive summarization systems today produce fluent and relevant output, but often “hallucinate” statements not supported by the source text. We analyze the connection between hallucinations and training data, and find evidence that models hallucinate because they train on target summaries that are unsupported by the source. Based on our findings, we present PINOCCHIO, a new decoding method that improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations. Given the model states and outputs at a given step, PINOCCHIO detects likely model hallucinations based on various measures of attribution to the source text. PINOCCHIO backtracks to find more consistent output, and can opt to produce no summary at all when no consistent generation can be found. In experiments, we find that PINOCCHIO improves the consistency of generation by an average of 68% on two abstractive summarization datasets, without hurting recall.
AB - Abstractive summarization systems today produce fluent and relevant output, but often “hallucinate” statements not supported by the source text. We analyze the connection between hallucinations and training data, and find evidence that models hallucinate because they train on target summaries that are unsupported by the source. Based on our findings, we present PINOCCHIO, a new decoding method that improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations. Given the model states and outputs at a given step, PINOCCHIO detects likely model hallucinations based on various measures of attribution to the source text. PINOCCHIO backtracks to find more consistent output, and can opt to produce no summary at all when no consistent generation can be found. In experiments, we find that PINOCCHIO improves the consistency of generation by an average of 68% on two abstractive summarization datasets, without hurting recall.
UR - http://www.scopus.com/inward/record.url?scp=85152908282&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85152908282&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85152908282
T3 - GEM 2022 - 2nd Workshop on Natural Language Generation, Evaluation, and Metrics, Proceedings of the Workshop
SP - 555
EP - 571
BT - GEM 2022 - 2nd Workshop on Natural Language Generation, Evaluation, and Metrics, Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
Y2 - 7 December 2022
ER -