Hallucinating about Legal AI hallucinations

John Bliss 1/19/24.

A new study suggests that generative AI’s legal hallucinations are “alarmingly prevalent,” providing “error-ridden legal answers” to 69-88% of legal questions.[1] But this finding is misleading and has been widely misinterpreted. The study itself is rigorous and has important implications. Yet, the response to the piece in mass and social media has almost universally missed a key point: this study is focused on last year’s technology, not the leading AI applications that are going mainstream in the legal profession and appear to hallucinate far less than the tech examined in the study.

For example, the authors tested the performance of GPT 3.5, released in November 2022, which we already know falters at legal tasks, failing the bar exam at below the 1^st percentile and barely passing law school exams (post). The authors did not test GPT 4, released in March 2023, which passed the bar exam at the 62^nd percentile and scored above the median on law exams when prompted well (achieving as high as an A- or A) (post). Moreover, there are some indications that GPT-4 has a reduced hallucination rate relative to GPT 3.5.[2] Even the current free version of ChatGPT, powered by GPT-3.5 Turbo seems to have a reduced hallucination rate.[3] Through the ChatGPT interface, a user can further reduce hallucinations by prompting the app to search the internet and cite its work.

The law-specific applications use GPT-4 in combination with RAG (retrieval augmented generation) and other techniques to reduce hallucinations by connecting the LLM to real legal sources. Lexis describes their GPT-4 integration in Lexis+ AI as “hallucination free,” which may be an exaggeration, although my own experience with this app suggests that hallucinations (e.g. false cases) may be very rare .[4]

It is buried in the hallucination study, but the authors implicitly recognize GPT-4’s relative lack of hallucinations in one striking way: they used GPT-4 as a research assistant to identify the hallucinations (contradictions) of GPT 3.5.

The study makes important contributions, introducing a typology of legal hallucinations and a framework for analyzing whether LLMs can identify their own hallucinations. Moreover, it is crucial that we understand the capabilities of non-leading AI. In the first year of ChatGPT, many lawyers and non-lawyers undoubtedly turned to GPT 3.5 for legal information, advice, and drafting. Yet, even the current free version of ChatGPT (powered by GPT 3.5 Turbo) may produce fewer hallucinations than the version tested in this study (GPT 3.5).

Further research is needed into the hallucinations of the leading applications proliferating in the legal profession (e.g. Lexis+ AI, Co-Counsel, Microsoft 365 Co-pilot, ChatGPT Plus) and whatever applications come next. As the technology improves, we may need to shift to a more nuanced discussion of hallucinations, focusing less on blatant falsehoods (which may be minimized by using the right tech and using it well) and focusing more on the AI’s very polished but sometimes low-quality choice of legal sources and arguments. In the meantime, there is a risk that legal professionals will see headlines about widespread hallucinations, without realizing that these headlines do not bear on the legal AI tools available to them, and thus they may be unnecessarily deterred from using generative AI.

[1] Isabel Gottlieb & Isaiah Poritz, Popular AI Chatbots Found to Give Error-Ridden Legal Answers, Bloomberg (Jan. 12, 2024), https://shorturl.at/acovG; Matthew Dahl, Varun Magesh, Mirac Suzgun, Daniel E. Ho, “Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models,” https://arxiv.org/abs/2401.01301.

[2] See Cade Metz, Chatbots May ‘Hallucinate’ More Often than Many Realize, N.Y. Times (Nov. 6, 2023), https://www.nytimes.com/2023/11/06/technology/chatbots-hallucination-rates.html; Ben Wodecki, Leaderboard: OpenAI’s GPT-4 Has Lowest Hallucination Rate, AI Business (Nov. 21, 2023), https://aibusiness.com/nlp/openai-s-gpt-4-surpasses-rivals-in-document-summary-accuracy.

[3] Id.

[4] LexisNexis Launches Lexis+ AI, a Generative AI Solution with Linked Hallucination-Free Legal Citations (Oct. 25, 2023), https://www.lexisnexis.com/community/pressroom/b/news/posts/lexisnexis-launches-lexis-ai-a-generative-ai-solution-with-hallucination-free-linked-legal-citations.