Towards a Taxonomy of Logic for a Better Understanding of the Ostensible Reasoning of LLMs

Adnan Mahmud

Recent research, exemplified by "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity," (hereafter the Apple paper) reveals riveting patterns in how, even the most advanced reasoning models, including OpenAI's o1/o3 series, DeepSeek-R1, and Claude 3.7 Sonnet with "thinking" capabilities—handle complex logical tasks. These studies demonstrate that models experience complete accuracy collapse beyond certain complexity thresholds, exhibit counterintuitive scaling limitations where reasoning effort decreases as problems become more complex, and fail to execute explicit algorithms consistently.

Such findings illuminate a critical gap in our understanding: without a systematic framework for categorising and evaluating different types of logical reasoning, we lack the conceptual tools necessary to diagnose precisely where and why these models fail. The current discourse around LLM reasoning often conflates different forms of logic—deductive, inductive, abductive, analogical—without acknowledging their distinct epistemological foundations and computational requirements.

This conceptual confusion has practical consequences.

The need for clarity becomes even more pressing when we consider the meta-logical properties that distinguish different reasoning types. Some forms of reasoning are ampliative (adding new information), while others are non-ampliative (preserving existing information).

Without genuine taxonomic distinctions, such mixed results become difficult to interpret and even harder to improve upon.

Taxonomy Structure


Logical Reasoning
├── Deductive Reasoning [1, 2]
│   ├── Valid Arguments [1, 2]
│   │   ├── Sound Arguments - Valid + True Premises [3]
│   │   └── Unsound Arguments - Valid + False Premises [4]
│   ├── Invalid Arguments [2]
│   │   └── Invalid Arguments - Unsound by Form [2]
│   └── Rules of Inference
│       ├── Modus Ponens - p; if p then q; therefore q [5]
│       ├── Modus Tollens - not q; if p then q; therefore not p [6]
│       ├── Disjunctive Syllogism - p or q; not p; therefore q [7]
│       └── Hypothetical Syllogism - if p then q; if q then r; therefore if p then r [8]
├── Non-deductive Reasoning
│   ├── Inductive Reasoning - Individual → Universal [9, 10]
│   │   ├── Strong Inductive - High Probability [9, 10]
│   │   │   └── Cogent Arguments - Strong + True Premises [9, 10]
│   │   └── Weak Inductive - Low Probability [9, 10]
│   │       └── Uncogent Arguments - Weak or False Premises [9, 10]
│   ├── Abductive Reasoning - Inference to Best Explanation [11, 12, 13]
│   │   ├── Good Explanation - Simple, Consistent, Predictive [12]
│   │   └── Poor Explanation - Complex, Inconsistent [12]
│   ├── Analogical Reasoning - Similarity-based Transfer [14, 15]
│   │   ├── Strong Analogy - Relevant Similarities [14, 15]
│   │   └── Weak Analogy - Irrelevant Similarities [14, 15]
│   ├── Causal Reasoning - Cause-Effect Inference [16, 17]
│   │   ├── Direct Causation - Necessary/Sufficient Conditions [16]
│   │   └── Indirect Causation - Causal Chains/Networks [17]
│   ├── Probabilistic Reasoning - Uncertainty Quantification [18, 19]
│   │   ├── Bayesian Inference - Prior/Posterior Updates [18]
│   │   └── Frequentist Inference - Statistical Patterns [18, 19]
│   └── Counterfactual Reasoning - Hypothetical Scenarios [20, 21]
│       ├── Possible Worlds - Modal Alternatives [20]
│       └── Nearest Worlds - Minimal Changes [20]
├── Logical Systems - Formal Frameworks
│   ├── Classical Systems
│   │   ├── Aristotelian Logic - Syllogistic Reasoning [22]
│   │   ├── Propositional Logic - Truth-functional Operators [23]
│   │   └── Predicate Logic - Quantifiers and Relations [24]
│   ├── Extended Systems
│   │   ├── Modal Logic - Necessity/Possibility [25]
│   │   ├── Temporal Logic - Time Relations [26]
│   │   ├── Epistemic Logic - Knowledge and Belief [27]
│   │   └── Deontic Logic - Obligation and Permission [28]
│   ├── Alternative Systems
│   │   ├── Intuitionistic Logic - Constructive Proofs [29]
│   │   ├── Paraconsistent Logic - Contradiction Tolerance [30]
│   │   ├── Fuzzy Logic - Degrees of Truth [31]
│   │   └── Relevance Logic - Relevant Implication [32]
│   └── Computational Systems
│       ├── Automated Reasoning - Machine Inference [33]
│       ├── Defeasible Logic - Revisable Conclusions [34]
│       └── Non-monotonic Logic - Retractable Inference [35]
├── Fallacious Reasoning - Logical Errors
│   ├── Formal Fallacies - Structure Errors [36, 37]
│   │   ├── Affirming Consequent - q; if p then q; therefore p [36]
│   │   ├── Denying Antecedent - not p; if p then q; therefore not q [37]
│   │   └── Undistributed Middle - Syllogistic Error [22]
│   └── Informal Fallacies - Content/Context Errors [38, 39]
│       ├── Relevance Fallacies [38]
│       │   ├── Ad Hominem - Attack the Person [38]
│       │   ├── Appeal to Authority - Irrelevant Expertise [38]
│       │   └── Strawman - Misrepresentation [38]
│       ├── Causal Fallacies [38]
│       │   ├── Post Hoc - False Temporal Causation [38]
│       │   └── Correlation/Causation - Statistical Confusion [38]
│       ├── Ambiguity Fallacies [38]
│       │   ├── Equivocation - Ambiguous Terms [38]
│       │   └── Amphiboly - Grammatical Ambiguity [38]
│       └── Presumption Fallacies [38]
│           ├── False Dilemma - Limited Options [38]
│           └── Begging Question - Circular Reasoning [38]
├── Dialectical Reasoning - Argumentative Discourse
│   ├── Burden of Proof - Obligation Distribution [40]
│   ├── Argument Schemes - Presumptive Patterns [41]
│   └── Critical Questions - Scheme Evaluation [42]
├── Pragmatic Reasoning - Context-Dependent Inference
│   ├── Implicature - Conversational Inference [43]
│   ├── Presupposition - Background Assumptions [44]
│   └── Speech Acts - Performative Utterances [45]
└── Meta-logical Properties - Reasoning Characteristics
    ├── Ampliative vs Non-ampliative - Information Addition [46]
    ├── Defeasible vs Non-defeasible - Conclusion Stability [47]
    ├── Monotonic vs Non-monotonic - Information Accumulation [35]
    ├── Truth-preserving vs Truth-conducive - Guarantee vs Probability [48]
    └── Context-dependent vs Context-independent - Situational Sensitivity [49]

Discussion

Upon mapping the diverse landscape of logical reasoning into distinct categories, we can begin to diagnose more precisely where current models succeed and where they fundamentally struggle.

Deductive Reasoning and the Algorithm Execution Problem

The Apple paper reveals a particularly troubling finding: even when provided with explicit algorithms (such as the recursive solution for Tower of Hanoi), models fail to execute these procedures correctly at higher complexity levels.

In our taxonomy, deductive reasoning represents the gold standard of logical inference—conclusions that follow necessarily from premises through valid argument forms. The failure of reasoning models to execute given algorithms suggests a fundamental deficit in what we might call "computational deduction"—the ability to apply formal rules consistently across extended inference chains. This finding is particularly significant because algorithm execution should be one of the most straightforward applications of deductive reasoning. The rules are explicit, the steps are deterministic, and the logical form is clear. If models cannot reliably perform this type of reasoning, it raises serious questions about their capacity for more complex deductive tasks.

Non-deductive Reasoning and Pattern Recognition

Interestingly, the Apple paper shows that LLMs often excel in domains that align more closely with non-deductive reasoning forms. Their success in analogical reasoning tasks, for instance, suggests sophisticated pattern recognition capabilities that map well onto our taxonomy's treatment of similarity-based inference.

However, the inconsistent performance across different puzzle types—succeeding on Tower of Hanoi requiring ~100 moves while failing on River Crossing puzzles needing only ~10 moves—suggests that apparent analogical reasoning may be heavily dependent on training data familiarity rather than genuine analogical transfer. This points to a concerning pattern: what appears to be reasoning may actually be sophisticated memorization and pattern matching.

The probabilistic reasoning capabilities of LLMs also warrant examination through this taxonomic lens, for their failure to scale reasoning effort appropriately with problem complexity suggests that they may not be engaging in genuine Bayesian inference or other forms of principled probabilistic reasoning.

Meta-logical Properties and LLM Limitations

Perhaps most revealing are the meta-logical properties that distinguish different reasoning types. The research findings suggest that current LLMs struggle particularly with:

Table 1: Categories of meta-reasoning and associated failure patterns in current LLMs.

Non-ampliative Reasoning	Non-defeasible Reasoning	Context-independent Reasoning
Tasks requiring strict preservation of information without adding new content. Algorithm execution falls into this category, and the consistent failures here suggest fundamental limitations in maintaining logical rigor.	Tasks where conclusions must remain stable regardless of additional information. The "overthinking" phenomenon, where models find correct solutions early but then abandon them for incorrect alternatives, demonstrates a failure to recognize when conclusions should be treated as definitive.	Tasks requiring universal logical principles rather than situational adaptation. The dramatic performance differences across puzzle types suggest over-reliance on context-specific patterns rather than universal logical principles.

Implications and Path Forward

This taxonomic analysis suggests that current LLM evaluation frameworks may be fundamentally inadequate. Most assessments treat "reasoning" as a monolithic capability, but our taxonomy reveals it as a complex landscape of distinct logical processes with different epistemological foundations and computational requirements.

Future evaluation frameworks should:

Distinguish reasoning types explicitly: Rather than general "reasoning" scores, assessments should evaluate specific forms of logical inference separately.
Test meta-logical properties: Evaluations should explicitly test whether models can maintain consistency across non-defeasible reasoning tasks or appropriately adapt conclusions in defeasible contexts.
Assess compositional reasoning: The research shows that models fail as complexity increases, suggesting that true reasoning requires the ability to compose simple logical operations into complex inference chains.
Examine reasoning traces systematically: With thinking models, we can now analyze the quality of reasoning processes, not just outcomes. This requires frameworks grounded in logical theory rather than intuitive assessment.

The illusion of thinking revealed in the Apple paper may persist until we develop models that can navigate, with consistency, a comprehensive taxonomy of logical reasoning. Only by understanding the structured landscape of logic can we hope to build systems that genuinely reason rather than merely simulate the appearance of reasoning.

References

[1] Copi, I. M., Cohen, C., & Rodych, V. (2018). Introduction to Logic (15th ed.). Routledge.

[2] Bergmann, M., Moor, J., & Nelson, J. (2013). The Logic Book (6th ed.). McGraw-Hill.

[3] Shapiro, S. (1998). Logical consequence: Models and modality. In The Philosophy of Mathematics Today (pp. 131-156). Oxford University Press.

[4] Priest, G. (2006). In Contradiction: A Study of the Transconsistent (2nd ed.). Oxford University Press.

[5] Robinson, J. A. (1965). A machine-oriented logic based on the resolution principle. Journal of the ACM, 12(1), 23-41.

[6] Davis, M., & Putnam, H. (1960). A computing procedure for quantification theory. Journal of the ACM, 7(3), 201-215.

[7] Loveland, D. W. (1978). Automated Theorem Proving: A Logical Basis. North-Holland.

[8] Andrews, P. B. (2002). An Introduction to Mathematical Logic and Type Theory (2nd ed.). Kluwer Academic Publishers.

[9] Mill, J. S. (1843/1973). A System of Logic, Ratiocinative and Inductive. University of Toronto Press.

[10] Skyrms, B. (2000). Choice and Chance: An Introduction to Inductive Logic. Wadsworth Publishing.

[11] Peirce, C. S. (1931-1958). Abduction and Induction in Collected Papers. Harvard University Press.

[12] Harman, G. (1965). The Inference to the Best Explanation. Philosophical Review, 74(1), 88-95.

[13] Josephson, J. R. & Josephson, S. G. (eds.) (1994). Abductive Inference: Computation, Philosophy, Technology. Cambridge University Press.

[14] Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7(2), 155-170.

[15] Forbus, K. D., Gentner, D., & Law, K. (1995). MAC/FAC: A model of similarity-based retrieval. Cognitive Science, 19(2), 141-205.

[16] Pearl, J. (2009). Causality: Models, Reasoning and Inference (2nd ed.). Cambridge University Press.

[17] Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, Prediction, and Search (2nd ed.). MIT Press.

[18] Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge University Press.

[19] Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann.

[20] Lewis, D. (1973). Counterfactuals. Harvard University Press.

[21] Pearl, J. (2000). The Logic of Structure-Based Counterfactuals in Causality. Cambridge University Press.

[22] Smith, Robin (trans.) (1989). Aristotle's Prior Analytics. Hackett.

[23] Hurley, Patrick J. (2014). A Concise Introduction to Logic (12th ed.). Cengage Learning.

[24] Mendelson, Elliott (2015). Mathematical Logic (6th ed.). CRC Press.

[25] Hughes, G. E. and Cresswell, M. J. (1996). A New Introduction to Modal Logic. Routledge.

[26] Prior, Arthur N. (1967). Past, Present and Future. Oxford University Press.

[27] Fagin, R., Halpern, J. Y., Moses, Y., & Vardi, M. Y. (1995). Reasoning About Knowledge. MIT Press.

[28] von Wright, Georg Henrik (1951). An Essay in Modal Logic. Studies in Logic and the Foundations of Mathematics. North-Holland.

[29] Heyting, Arend (1956). Intuitionism: An Introduction. North-Holland.

[30] Priest, Graham (2006). In Contradiction (2nd ed.). Oxford University Press.

[31] Zadeh, Lofti A. (1965). Fuzzy sets. Information and Control, 8(3), 338-353.

[32] Anderson, A. R. & Belnap, N. D. (1975). Entailment: The Logic of Relevance and Necessity. Princeton University Press.

[33] Robinson, J. A. (1965). A machine-oriented logic based on the resolution principle. Journal of the ACM, 12(1), 23-41.

[34] Nute, Donald (1994). Defeasible Logic. In Handbook of Logic in AI, Vol. 3. Oxford University Press.

[35] Reiter, Ray (1980). A logic for default reasoning. Artificial Intelligence, 13(1-2), 81-132.

[36] Hurley, P. J. (2015). A Concise Introduction to Logic (12th ed.). Wadsworth Cengage.

[37] Copi, I. M., Cohen, C., & McMahon, K. (2014). Introduction to Logic (14th ed.). Pearson.

[38] Walton, D. N. (1995). A Pragmatic Theory of Fallacy. University of Alabama Press.

[39] van Eemeren, F. H., & Grootendorst, R. (1987). Fallacies in pragma-dialectical perspective. Argumentation, 1(3), 283-301.

[40] Walton, D. N. (1988). Burden of proof. Argumentation, 2(2), 233-254.

[41] Walton, D. N., Reed, C., & Macagno, F. (2008). Argumentation Schemes. Cambridge University Press.

[42] Walton, D. N., & Godden, D. M. (2007). Critical questions in argumentation schemes. Informal Logic, 27(3), 267-292.

[43] Grice, H. P. (1975). Logic and conversation. In Syntax and Semantics 3 (pp. 41-58). Academic Press.

[44] Strawson, P. F. (1950). On referring. Mind, 59(235), 320-344.

[45] Austin, J. L. (1962). How to Do Things with Words. Harvard University Press.

[46] Peirce, C. S. (1878). How to Make Our Ideas Clear. Popular Science Monthly, 12, 286-302.

[47] Pollock, J. L. (1995). Cognitive Carpentry. MIT Press.

[48] Goldman, A. I. (1986). Epistemology and Cognition. Harvard University Press.

[49] Kaplan, D. (1989). Demonstratives. In Themes from Kaplan. Oxford University Press.