Abstract
This study therefore examines the intersection between the concept of legal uncertainty and the semantic entropy approach, and discusses proposed solutions for the detection and management of high-entropy responses.
I. Introductıon
The increasing use of artificial intelligence systems in the field of law has generated debates not only on a technical level but also on normative and conceptual grounds. Large Language Models (“LLMs”), owing to their capacity to comprehend and generate natural language, offer significant opportunities in areas such as legal text analysis, contract review, and the prediction of judicial decisions. However, how these technologies behave when confronted with ambiguous or indeterminate legal expressions has emerged as a fundamental issue concerning the reliability of such systems.
In a study published in Nature1 (2024), researchers developed a novel measurement method called “semantic entropy” to identify incorrect or confabulated responses generated by large language models. This method aims to quantify the level of uncertainty by assessing the semantic diversity among multiple alternative responses that a model provides to the same question. Experiments conducted on various question–answer datasets, including BioASQ, SQuAD, TriviaQA, and NQ-Open, revealed that responses exhibiting high semantic entropy were often incorrect or semantically inconsistent. Moreover, it was demonstrated that omitting responses to highly uncertain questions significantly improved the overall accuracy of the model. These findings indicate that semantic entropy functions not only as a theoretical construct but also as a practical tool for assessing the reliability of AI-assisted systems.
In this context, semantic entropy serves not merely as a technical metric but also as an essential safeguard in enabling AI-assisted legal decision-making processes to cope with situations involving legal uncertainty. The analysis of responses generated by large language models in situations involving legal uncertainty through semantic entropy contributes to the identification of semantic inconsistencies and to enhancing the reliability of AI-assisted legal processes.
II. Legal Uncertaınty and Its Interactıon wıth Artıfıcıal Intellıgence
A. Legal Uncertainty
Legal uncertainty is an inherent feature of law, referring to the interpretive divergences and unpredictability that arise in the application of rules or principles to concrete cases2. This uncertainty may stem from various sources such as the general and abstract nature of legal norms, the dynamic character of judicial precedents, the ambiguity or indeterminacy of language, and the reflection of social change upon legal structures3. Law, rather than representing absolute certainty, functions as a dynamic system that is open to interpretation and adaptation in accordance with evolving social conditions. Although legal rules are founded upon established principles and norms, they are not designed to provide predetermined and absolute truths that can encompass every possible factual circumstance. Consequently, law operates within an inherent flexibility that allows for contextual interpretation and jurisprudential development, thereby inevitably embodying a certain degree of indeterminacy.
The concept of “open texture”, introduced by H. L. A. Hart, underscores that legal rules can never be defined with complete precision and that interpretive judgment is therefore unavoidable4. According to Hart, legislators cannot foresee all possible factual situations that may arise in the future; hence, legal texts necessarily leave a zone of uncertainty in so-called “penumbral cases”.
Ronald Dworkin, on the other hand, perceives law not merely as a set of rules but as a broader framework encompassing principles and policies5. This perspective adds an additional interpretive layer, requiring legal practitioners to consider not only the literal text but also the underlying spirit and purpose of the law. Such interpretive depth intensifies the semantic and contextual challenges that artificial intelligence systems face in their pursuit of a “correct” legal answer.
B. The Relationship between Legal Uncertainty and Artificial Intelligence
Artificial intelligence systems generate interpretations by learning patterns from existing datasets. However, the semantic richness of natural language and the unique interpretive dynamics inherent in legal reasoning make it challenging for AI to cope with legal uncertainty. When models process statutory provisions subject to divergent interpretations or conflicting judicial precedents, they may produce inconsistent and unpredictable outputs. This phenomenon poses a risk to achieving “normative stability”, a fundamental feature of law that ensures consistency and predictability within the legal system.
The decision-making mechanisms of large language models (LLMs) are often opaque due to the complex algorithms based on billions of parameters, rendering these systems a “black box.” In addition, AI models exhibit a tendency to confabulate—that is, to generate factually incorrect or fabricated information—which can have serious implications in the legal domain. Erroneous or confabulated outcomes may lead to significant and potentially irreparable harm. Within this framework, the semantic entropy method aims to enhance the reliability of AI-assisted legal processes by identifying such risks in advance and quantifying the degree of semantic uncertainty present in AI-generated responses.
III. Semantıc Entropy and law
A. The Concept of Semantic Entropy
Entropy is a concept originating from physics and information theory, used to measure the degree of disorder or uncertainty within a system6. In Claude Elwood Shannon’s classical theory of information, entropy quantifies the unpredictability or indeterminacy of a given piece of information. The term semantic, on the other hand, pertains to “meaning” in the linguistic sense, referring to the conceptual content of an expression, its relationship to context, and the degree of semantic similarity it conveys.
The combination of these two notions gives rise to the concept of semantic entropy, which measures the extent to which a word or expression may carry different meanings across varying contexts7. In other words, the greater the semantic ambiguity of a linguistic unit, the higher its semantic entropy. This characteristic holds particular significance for artificial intelligence systems, especially large language models, which rely on natural language processing capabilities. As these models process text, they must account for the potential multiplicity of meanings embedded in words and sentences. Consequently, semantic entropy serves as an effective metric for evaluating both semantic diversity and the degree of consistency in a model’s generated responses.
B. Semantic Entropy and Semantic Ambiguity in the Legal Context
Since law essentially contains the purpose of establishing a normative order, the certainty of the meaning and terminology used is of great importance. The potential of a word or an expression in a legal text to have multiple meanings may directly affect the degree of openness to interpretation and create a basis for legal uncertainty. In this context, the concept of semantic entropy may serve as a quantitative criterion for semantic ambiguity in legal language.
High entropy indicates that the system is less predictable and that the information density carries a higher degree of uncertainty8. For example, the word “right” in the field of law may have various meanings in different contexts:
(i) In a subjective sense, it expresses the powers and interests granted to individuals. Examples such as the right of ownership and the right to claim can be given.
(ii) In a moral sense, it expresses the demand to do what is just and right, derived from ethical and natural law principles.
(iii) In the sense of authority, it is used to indicate a person’s capacity to perform a certain legal act. Examples such as the right to terminate a contract and the right to file a lawsuit can be given.
(iv) As an abstract normative principle, the word right expresses universal values. For example, the concept of human rights represents the fundamental inviolabilities and freedoms possessed by individuals.
In light of these examples, it is clear that the word “right” has high entropy. Since the word can carry highly different and sometimes even contradictory meanings depending on its context, determining the correct meaning requires a semantic analysis and interpretation activity.
Semantic entropy measures not only the probabilities at the word level but also the uncertainty by considering the different but semantically similar answers that the model gives to the same question9. In this way, it can be evaluated how consistent and reliable the answers given by artificial intelligence systems are in terms of meaning. In particular, semantic entropy stands out as an effective method in distinguishing the answers of the model that carry a risk of confabulation and in increasing system reliability by refusing to answer questions involving high uncertainty10. Because through semantic entropy, it is aimed to ensure safer outputs by detecting the uncertainty encountered while processing legal texts in artificial intelligence–supported systems11.
IV. ManagIng Legal UncertaInty In ArtIfIcIal IntellIgence Systems through SemantIc Entropy
The widespread use of artificial intelligence systems in the field of law has brought not only technical but also normative issues of security and responsibility to the agenda. In this context, semantic entropy comes to the forefront with its ability to measure the level of semantic uncertainty in the responses of LLMs and to detect in advance the situations in which they tend to produce confabulations12. The method introduced in the Nature (2024) study presents a scalable and unsupervised error prediction system developed to increase the safety of artificial intelligence in high-risk areas.
However, in order to evaluate the extent to which this method can be applied to legal practice, it is important to understand how semantic entropy functions not only as a technical measure but also on semantic and normative levels, how it can be applied in situations of legal uncertainty, and how it brings clarity to the “black box” nature of artificial intelligence systems.
A. Applicability to Legal Uncertainty
Semantic entropy analyzes the predictability and reliability level of the model’s responses by measuring the semantic similarities or differences among multiple answers given to the same question. The basic assumption here is that responses carrying semantic inconsistency reflect the model’s confidence in its knowledge on that particular subject. For example, answers such as “Paris” or “The capital of France is Paris” to the question “What is the capital of France?” indicate low entropy, whereas producing contextually different answers such as “Harvard,” “Oxford,” and “Yale” to the question “Which is the best law school?” creates high entropy. This difference reveals an uncertainty not only at the stylistic but also at the semantic level13.
In the legal context, high semantic entropy points to situations where interpretive differences are intense or conceptual inconsistencies may arise. Within this framework, two main strategies have been developed:
1. Rejection Accurancy
The model may automatically refrain from generating an answer when confronted with questions that have high semantic entropy. This not only reduces erroneous outputs but also increases the overall accuracy and predictability of the system14. In the Nature (2024) study, this strategy was measured using the AURAC metric, and it was demonstrated that the model’s refusal to respond to questions with high uncertainty significantly improved its accuracy rate in other answered questions15.
2. Confidence Score Integration and Human-Centered Supervision
The semantic entropy value can be presented together with a confidence score accompanying the LLM’s responses. This score provides the user with preliminary information about the extent to which the given information requires human supervision. Especially in legal decision-support systems, such scores constitute an important control layer in terms of ensuring transparency and auditability of decisions16.
B. The Black Box Problem and the Explanatory Role of Semantic Entropy
The decision-making processes of large language models are generally not transparent to human oversight due to their complex structures based on billions of parameters. With these characteristics, artificial intelligence systems possess the nature of a “black box”. In cases where the model produces inconsistent or erroneous information with high semantic entropy, this state of uncertainty deepens further and brings about reliability problems. In this context, the hallucinative responses produced by artificial intelligence systems—that is, contents that are false, contextually disconnected, or confabulated—constitute a risk area that must be managed with particular care. As emphasized in the Nature (2024) study, such hallucinations usually occur when the model gives inconsistent and random answers to the same question. Although these contents may appear formally convincing, they can be factually incorrect and may lead to serious misdirections in legal processes.
In the legal field, such errors do not only result in the presentation of incorrect information but also carry the risk of leading to consequences that are difficult to remedy. At this point, semantic entropy provides an early warning signal to the user regarding the reliability of the system’s decisions by measuring the semantic inconsistencies within the black box. In this way, it becomes possible to distinguish decisions that appear technically valid but are semantically problematic.
At this point, semantic entropy makes it possible to reveal contradictions, inconsistencies, and deviations occurring at the semantic level in the outputs of LLM systems that possess a black box nature, allowing for the early detection of the system’s tendency to produce hallucinations. Observing high semantic diversity among the answers given to the same question indicates that the model generates arbitrary outputs not based on knowledge, and this situation is directly evaluated as a hallucinative production form that falls under the category of confabulation. Particularly in areas such as the interpretation of legal texts, the analysis of precedents, or the application of norms to concrete cases, such semantic deviations constitute not only a technical error but also a ground that may give rise to serious consequences involving normative responsibility. For example, in a precedent analysis, if the model, with high entropy, presents different and contradictory justifications and produces inconsistent legal inferences regarding the same case, this is a clear indication of hallucination. In this respect, semantic entropy is not merely a tool for measuring accuracy; it also functions as a control mechanism for the monitoring and prevention of hallucinative productions through semantic consistency. Considering the potentially irreparable damages that hallucinations may cause in legal processes, the use of this method should be regarded as the digital counterpart of the legal duty of care and should be adopted as a fundamental tool that enhances reliability, especially in AI-based decision systems where human supervision is limited.
V. Conclusıon
By systematically identifying semantic ambiguities in model outputs, semantic entropy makes visible the limitations of generative artificial intelligence systems, particularly in context-sensitive areas such as the interpretation of legal texts, the application of norms, and the transfer of judicial precedents. Semantically inconsistent answers given to the same question may indicate that the model produces results based on statistical prediction rather than genuine knowledge. In this context, semantic entropy provides a technical warning function to the user by detecting uncertainty through semantic diversity.
However, the measurement offered by this method is essentially a technical indicator and, on its own, is not sufficient to explain the nature of every uncertainty that carries legal significance. Legal uncertainty is not limited merely to linguistic ambiguity or conceptual diversity; it is shaped by multilayered structures such as interpretive differences, the historical context of jurisprudence, fundamental principles of law, and its relationship with social values. In this respect, legal uncertainty operates at a deeper level than what technical systems are capable of perceiving. Semantic entropy only detects semantic fluctuation within textual outputs; however, it cannot evaluate how such fluctuation aligns with the integrity of the legal system, its internal consistency, or the notion of justice.
Therefore, semantic entropy may be considered a useful measure that can contribute to the evaluation of content generated by large language models in legal contexts; nevertheless, this contribution provides limited visibility within the conceptual and interpretive depth of law and does not possess a complementary explanatory capacity in situations requiring normative assessment. Considering that uncertainty is not only a technical but also a normative problem, it does not seem possible to use measurement tools reduced solely to semantic diversity as ultimate determinants in the field of law. Legal decisions are shaped not only on the basis of logical consistency but also by elements such as context, the concept of justice, public interest, constitutional values, and interpretive methods. Such multilayered evaluative structures create an interpretive domain that cannot be fully captured by technical metrics.
The components of internal consistency, contextuality, and value judgment required by legal interpretation must be evaluated beyond existing technical measurement systems and within the framework of a distinct legal methodology. Law is not a system that operates solely through a binary distinction between right and wrong; on the contrary, the same text may often have multiple legitimate and reasonable interpretations. Therefore, although semantic diversity may be technically classified as “uncertainty,” such diversity may possess normative legitimacy from a legal standpoint. At this point, it should be remembered that methods such as semantic entropy are supportive but limited tools within legal evaluation processes. Ultimately, meaning-making in the legal context relies not only on linguistic analysis but also on interpretive principles, doctrines, and the normative reasoning developed through jurisprudence.
DİPNOT
Sebastian Farquhar/ Jannik Kossen/ Lorenz Kuhn/ Yarin Gal, “Detecting Hallucinations in Large Language Models Using Semantic Entropy”, Nature, C. 630, S. 8017, 2024.
Gülriz Özkök, “Hukuki Belirsizlik Problemi Üzerine”, Ankara Üniversitesi Hukuk Fakültesi Dergisi, C. 51, S. 2, 2002, s. 1-2.
Timothy Endicott, Vagueness in Law, 1. Bas1, OUP, Oxford, 2001, s. 31-55.
Herbert Lionel Adolphus Hart, The Concept of Law, 3. Bası, Oxford University Press, Oxford, 2012, s. 124-136.
Ronald Dworkin, Law’s Empire, Harvard University Press, Cambridge, 1986, s. 45-113.
Constantino Tsallis, “Entropy”, Encyclopedia, C. 2, S. 1, 2022, s. 264.
Kuhn/ Gal/ Farquhar, Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation, s. 1-2.
Claude E. Shannon, “A Mathematical Theory of Communication”, Bell System Technical Journal, C. 27, S. 34, 1948, s. 392 (10).
Kossen/ Jiatong Han/ Muhammed Razzak/ Lisa Schut/ Shreshth Malik/ Gal, Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs, 2024, s. 1. (Erişim Tarihi: 31.07.2025).




.webp)


