Abstract
With the advent of large language models, methods for abstractive summarization have made great strides, creating potential for use in applications to aid knowledge workers processing unwieldy document collections. One such setting is the Civil Rights Litigation Clearinghouse (CRLC), which posts information about large-scale civil rights lawsuits, serving lawyers, scholars, and the general public. Today, summarization in the CRLC requires extensive training of lawyers and law students who spend hours per case understanding multiple relevant documents in order to produce high-quality summaries of key events and outcomes. Motivated by this ongoing real-world summarization effort, we introduce Multi-LexSum, a collection of 9,280 expert-authored summaries drawn from ongoing CRLC writing. Multi-LexSum presents a challenging multi-document summarization task given the length of the source documents, often exceeding two hundred pages per case. Furthermore, Multi-LexSum is distinct from other datasets in its multiple target summaries, each at a different granularity (ranging from one-sentence “extreme” summaries to multi-paragraph narrations of over five hundred words). We present extensive analysis demonstrating that despite the high-quality summaries in the training data (adhering to strict content and style guidelines), state-of-the-art summarization models perform poorly on this task. We release Multi-LexSum for further summarization research and to facilitate the development of applications to assist in the CRLC's mission.
Original language | English (US) |
---|---|
Title of host publication | Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022 |
Editors | S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh |
Publisher | Neural information processing systems foundation |
ISBN (Electronic) | 9781713871088 |
State | Published - 2022 |
Event | 36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, United States Duration: Nov 28 2022 → Dec 9 2022 |
Publication series
Name | Advances in Neural Information Processing Systems |
---|---|
Volume | 35 |
ISSN (Print) | 1049-5258 |
Conference
Conference | 36th Conference on Neural Information Processing Systems, NeurIPS 2022 |
---|---|
Country/Territory | United States |
City | New Orleans |
Period | 11/28/22 → 12/9/22 |
Funding
We thank the reviewers for their very helpful suggestions and feedback! We thank the following institutions and entities who generously provide the support for the curation of the underlying Civil Rights Litigation Clearinghouse data over its 15-year history, including: University of Michigan Law School; Washington University in St. Louis School of Law; Center for Empirical Research in Law; Arnold Ventures, “Improving Criminal Justice Reformers’ Use of Litigation Information, Documents, and Insights” (2021-2023); Vital Projects Fund, “Revamping the Civil Rights Litigation Clearinghouse” (2021); Proteus Fund, “Revamping the Civil Rights Litigation Clearinghouse” (2021); National Science Foundation SES-0718831, “The Litigation Process in Government-Initiated Employment Discrimination Suits” (2007). The construction of the Multi-LexSum dataset was also funded in part by NSF Convergence Accelerator Award ITE-2132318. We thank the reviewers for their very helpful suggestions and feedback! We thank the following institutions and entities who generously provide the support for the curation of the underlying Civil Rights Litigation Clearinghouse data over its 15-year history, including: University of Michigan Law School; Washington University in St. Louis School of Law; Center for Empirical Research in Law; Arnold Ventures, “Improving Criminal Justice Reformers' Use of Litigation Information, Documents, and Insights” (2021-2023); Vital Projects Fund, “Revamping the Civil Rights Litigation Clearinghouse” (2021); Proteus Fund, “Revamping the Civil Rights Litigation Clearinghouse” (2021); National Science Foundation SES-0718831, “The Litigation Process in Government-Initiated Employment Discrimination Suits” (2007). The construction of the Multi-LexSum dataset was also funded in part by NSF Convergence Accelerator Award ITE-2132318.
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems
- Signal Processing