TY - GEN
T1 - From data to information
T2 - 18th International Conference on Artificial Intelligence and Law, ICAIL 2021
AU - Paley, Andrew
AU - Zhao, Andong L.Li
AU - Pack, Harper
AU - Servantez, Sergio
AU - Adler, Rachel F.
AU - Sterbentz, Marko
AU - Pah, Adam
AU - Schwartz, David
AU - Barrie, Cameron
AU - Einarsson, Alexander
AU - Hammond, Kristian
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/6/21
Y1 - 2021/6/21
N2 - The U.S. court system is the nation's arbiter of justice, tasked with the responsibility of ensuring equal protection under the law. But hurdles to information access obscure the inner workings of the system, preventing stakeholders - from legal scholars to journalists and members of the public - from understanding the state of justice in America at scale. There is an ongoing data access argument here: U.S. court records are public data and should be freely available. But open data arguments represent a half-measure; what we really need is open information. This distinction marks the difference between downloading a zip file containing a quarter-million case dockets and getting the real-time answer to a question like "Are pro se parties more or less likely to receive fee waivers?"To help bridge that gap, we introduce a novel platform and user experience that provides users with the tools necessary to explore data and drive analysis via natural language statements. Our approach leverages an ontology configuration that adds domain-relevant data semantics to database schemas to provide support for user guidance and for search and analysis without user-entered code or SQL. The system is embodied in a "natural-language notebook"user experience, and we apply this approach to the space of case docket data from the U.S. federal court system. Additionally, we provide detail on the collection, ingestion and processing of the dockets themselves, including early experiments in the use of language modeling for docket entry classification with an initial focus on motions.
AB - The U.S. court system is the nation's arbiter of justice, tasked with the responsibility of ensuring equal protection under the law. But hurdles to information access obscure the inner workings of the system, preventing stakeholders - from legal scholars to journalists and members of the public - from understanding the state of justice in America at scale. There is an ongoing data access argument here: U.S. court records are public data and should be freely available. But open data arguments represent a half-measure; what we really need is open information. This distinction marks the difference between downloading a zip file containing a quarter-million case dockets and getting the real-time answer to a question like "Are pro se parties more or less likely to receive fee waivers?"To help bridge that gap, we introduce a novel platform and user experience that provides users with the tools necessary to explore data and drive analysis via natural language statements. Our approach leverages an ontology configuration that adds domain-relevant data semantics to database schemas to provide support for user guidance and for search and analysis without user-entered code or SQL. The system is embodied in a "natural-language notebook"user experience, and we apply this approach to the space of case docket data from the U.S. federal court system. Additionally, we provide detail on the collection, ingestion and processing of the dockets themselves, including early experiments in the use of language modeling for docket entry classification with an initial focus on motions.
KW - data analytics
KW - information extraction
KW - natural language processing
KW - notebook interface
KW - visualization
UR - http://www.scopus.com/inward/record.url?scp=85112351799&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112351799&partnerID=8YFLogxK
U2 - 10.1145/3462757.3466100
DO - 10.1145/3462757.3466100
M3 - Conference contribution
AN - SCOPUS:85112351799
T3 - Proceedings of the 18th International Conference on Artificial Intelligence and Law, ICAIL 2021
SP - 119
EP - 128
BT - Proceedings of the 18th International Conference on Artificial Intelligence and Law, ICAIL 2021
PB - Association for Computing Machinery, Inc
Y2 - 21 June 2021 through 25 June 2021
ER -