
John Bliss 1/28/24.
A position paper argues that GPT-4’s performance on law exams does not provide evidence that AI is “set to redefine the legal profession.”[1] In a Substack post, the authors summarize their position, as follows, “Will AI transform law? The hype is not supported by current evidence.”[2] I think the authors are right to counsel uncertainty, but they may go too far in their “anti-hype,” downplaying (and misconstruing) the current state of empirical research on legal AI.
The paper makes several important contributions. It outlines a typology of legal tasks where we should assess AI’s capabilities, involving information processing, creativity or judgments, and predictions. And it explores limitations of research on these tasks, including potential contamination (e.g. if the LLM has already seen bar exam questions in its training data) and construct validity (e.g. if the exam performance fails to reflect the legal competence we aim to test for). The authors make a compelling case for “socio-technical assessments,” rooted in the real-world of legal practice rather than exams.
The piece concludes that empirical research has yet to address, and may not be capable of fully addressing, the question we ultimately seek to answer—how much real legal work can be accomplished or assisted by AI? But the authors miss two key points.
The piece does not cite the randomized controlled trial by Choi, Monahan, and Schwarcz, which simulated real legal tasks of drafting a memo, a complaint, a contract, and an employee handbook.[2] The research participants were law students not lawyers, but this study does push back against the claim that AI’s capabilities on legal tasks are untested and untestable. Moreover, the studies of exams and real legal tasks have yielded similar findings, which may support the notion that the exam studies have real-world implications.[3] These findings include that using GPT-4 accelerated legal work and had an equalizing effect such that it was especially helpful to students who would otherwise score near the bottom of the class.
The second key point I want to raise is that the authors do not acknowledge the mainstreaming of generative AI in legal research and writing, as lawyers are gaining access to large language models within legal research applications (e.g. Lexis, Westlaw, and other legal tech). At the same time, generative AI is being integrated in internet search and word processing applications. A recent survey suggests that most lawyers expect to use generative AI in their practice this year.[4]
We do not know how transformative this technology will be, as the authors of the position paper emphasize. But we have more evidence on legal AI capabilities than the piece acknowledges. And more work is in the pipeline. I am in conversation with other empirical researchers about next steps in examining lawyers’ use of generative AI. Meanwhile, law firms are running their own assessments. I am confident that we will continue to see rigorous research exploring the efficiency and quality of legal work that is produced or assisted by generative AI. Moreover, we will hear lawyers own anecdotal and surveyed perspectives on how this technology plays out in their practice.
As we shift this research agenda toward practice contexts, I hope that we will also continue to study AI performance on law exams. Although such studies are limited in their generalizability to real-world legal tasks, they provide a clear benchmark of progress relative to prior AI systems, based on established grading metrics, e.g. GPT 3.5 to GPT-4 jumping from below the 1st percentile to roughly the 62nd percentile among first-time takers of the bar exam. As new foundation models and law-specific applications are released, it will be useful to know whether and to what extent they outperform prior technology. These evaluations can help gauge the evolving state of legal AI capabilities.
—–
[1] Sayash Kapoo, Peter Henderson, Arvind Narayanan, Promises and Pitfalls of Artificial Intelligence for Legal Applications, forthcoming in J. Cross-disciplinary Research in Computational Law, available at https://www.cs.princeton.edu/~sayashk/papers/crcl-kapoor-henderson-narayanan.pdf.
[2] Arvind Narayanan & Sayash Kapoor, Will AI Transform Law? The Hype Is not Supported by Current Evidence, AI Snake Oil (Jan. 24, 2024), https://www.aisnakeoil.com/p/will-ai-transform-law?r=5tyxj&utm_medium=ios&utm_campaign=post.
[3] Jonathan H. Choi, Amy Monahan, and Daniel Schwarcz, Lawyering in the Age of Artificial Intelligence (Working Paper, Nov. 9, 2023), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4626276.
[4] Jonathan Choi & Daniel Schwarcz, AI Assistance in Legal Analysis: An Empirical Study (Working Paper Aug. 16, 2023), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4539836.
[5] Wolters Kluwer, Future Ready Lawyer 2023 Report (2023), https://www.wolterskluwer.com/en/news/future-ready-lawyer-2023-report.