Large language models as tax attorneys: a case study in legal capabilities emergence

John J. Nay*, David Karamardian, Sarah B. Lawsky, Wenting Tao, Meghana Bhat, Raghav Jain, Aaron Travis Lee, Jonathan H. Choi, Jungo Kasai

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Better understanding of Large Language Models' (LLMs) legal analysis abilities can contribute to improving the efficiency of legal services, governing artificial intelligence and leveraging LLMs to identify inconsistencies in law. This paper explores LLM capabilities in applying tax law. We choose this area of law because it has a structure that allows us to set up automated validation pipelines across thousands of examples, requires logical reasoning and maths skills, and enables us to test LLM capabilities in a manner relevant to real-world economic lives of citizens and companies. Our experiments demonstrate emerging legal understanding capabilities, with improved performance in each subsequent OpenAI model release. We experiment with retrieving and using the relevant legal authority to assess the impact of providing additional legal context to LLMs. Few-shot prompting, presenting examples of question-answer pairs, is also found to significantly enhance the performance of the most advanced model, GPT-4. The findings indicate that LLMs, particularly when combined with prompting enhancements and the correct legal texts, can perform at high levels of accuracy but not yet at expert tax lawyer levels. As LLMs continue to advance, their ability to reason about law autonomously could have significant implications for the legal profession and AI governance. This article is part of the theme issue 'A complexity science approach to law and governance'.

Original languageEnglish (US)
Article number20230159
JournalPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
Volume382
Issue number2270
DOIs
StatePublished - Apr 15 2024

Funding

The Mercatus Center at George Mason University funded Meghana Bhat's work related to this research, and some of the computing costs for running the experiments. Pierre-Loic Doulcet, of Stanford University's CodeX Center for Legal Informatics, contributed to the US Code vector embeddings work.

Keywords

  • artificial intelligence
  • computational law
  • large language models
  • law informs code
  • law-informed AI
  • machine learning

ASJC Scopus subject areas

  • General Mathematics
  • General Engineering
  • General Physics and Astronomy

Fingerprint

Dive into the research topics of 'Large language models as tax attorneys: a case study in legal capabilities emergence'. Together they form a unique fingerprint.

Cite this