Integrating Deep Learning with Inductive Logic Programming: A Literature Review for the Genesis-AI

Adnan Mahmud

Deep learning and Inductive Logic Programming (ILP) represent fundamentally complementary paradigms. Deep learning excels at scalable pattern recognition over high-dimensional data but produces models so opaque that an entire subfield, Explainable AI, exists largely to apologise for them; ILP learns human-readable logical rules from relational data but struggles with computational scalability. Their integration has been a core ambition of the neuro-symbolic AI domain for some time, yet no existing system simultaneously handles raw sensor data from massively parallel experiments, maintains interpretable logical hypotheses, supports abductive revision from streaming observations, and enforces domain-specific scientific constraints.

This review surveys the principal mechanisms by which deep learning and ILP have been connected in the literature from 2018 to 2026, organised around five thematic axes: (A) propositionalisation pipelines that convert relational logic into neural feature spaces; (B) differentiable ILP systems that enable gradient-based optimisation over symbolic inference; (C) broader neuro-symbolic architectures spanning constraint injection and probabilistic logic programming; (D) online and incremental ILP systems approaching real-time operation; and (E) abductive reasoning frameworks that bridge neural perception with logical hypothesis generation. The synthesis is framed through the requirements of Genesis-AI, a "Robot Scientist" under development by Professor Ross D. King at Chalmers University of Technology, which aims to autonomously conduct yeast systems biology at a thousandfold cost-benefit over human scientists using approximately 10,000 computer-controlled microbioreactors [6, 74].

Introduction: The Case for Neuro-Symbolic Scientific Reasoning

The development of comprehensive systems biology models of eukaryotic cells constitutes one of the most formidable challenges in modern science. Such models are central to the future of medicine (humans are eukaryotes), agriculture (plants, too, are eukaryotes), and biotechnology. Yet even a single-celled eukaryote like Saccharomyces cerevisiae, humble baker's yeast, thumbs its nose at human comprehension: approximately 6,000 genes, thousands of metabolic reactions distributed across eight cellular compartments, and tens of thousands of regulatory parameters, all conspiring to render traditional experimental science inadequate. To make matters worse, Duhem's thesis reminds us that no single experiment can condemn an isolated hypothesis but only a whole theoretical group [1], meaning that even when an experiment fails to confirm a prediction, the blame could lie anywhere in the model. This complexity compounds the challenge of model refinement and the generation of efficient experiments to discriminate between competing hypotheses [2].

Professor Ross D. King, a pioneer of autonomous scientific discovery, has spent over two decades developing Robot Scientists capable of automatically generating hypotheses, designing experiments, executing them physically, interpreting the results, and iterating [3, 4]. His Robot Scientist Adam was the first machine to autonomously discover new scientific knowledge, identifying orphan enzyme functions in yeast metabolism through closed-loop experimentation involving ILP-based hypothesis generation, robotic laboratory execution, and Bayesian experimental design [3, 4]. Eve, the second-generation Robot Scientist, extended this paradigm to drug discovery, using quantitative structure-activity relationship (QSAR) learning and automated compound screening to identify triclosan as an inhibitor of dihydrofolate reductase in malaria parasites—demonstrating that AI-selected compounds could outperform standard drug screening protocols [5].

Genesis-AI represents the third and most ambitious generation. Currently under construction at Chalmers University of Technology (and partly at the University of Cambridge), Genesis-AI centres on a microfluidic system comprising approximately 10,000 computer-controlled microbioreactors arranged in groups of 48 using the standard footprint of a microtiter plate [6]. Each microbioreactor can be configured in real time to run in batch, fed-batch, or continuous mode, enabling an extraordinarily wide range of biological conditions to be explored. The observables include growth rate, metabolic analysis of the growth medium (approximately 10 compounds), metabolic analysis of the internal state of the yeast (approximately 100 metabolites via mass spectrometry at a planned capacity of 10,000 measurements per day), and comprehensive gene expression levels for all 6,000 yeast genes via mRNA sequencing [6]. The system aims to demonstrate a thousand-fold cost-benefit over a good human scientist in a standard laboratory.

The AI component of Genesis faces a core architectural question: how should deep learning and symbolic reasoning (specifically Inductive Logic Programming) be integrated to enable real-time, closed-loop scientific discovery at this scale? Deep neural networks alone will not suffice, for four reasons that become clear the moment one tries: biological data is heterogeneous and relational, resisting the flat input vectors that neural networks prefer; structured background knowledge (metabolic models, stoichiometric constraints) does not slot neatly into a neural architecture; neural predictions are opaque in a domain where scientists need to verify why a model predicts growth or no growth, not merely that it does; and the search space is so vast that even 10,000 micro-bioreactors generating data continuously cannot cover it without hypothesis-led experimental design [6]. Classical ILP, for its part, handles the symbolic side beautifully but was never designed to learn in real time from constant data streams in a closed-loop system.

This review addresses that challenge by surveying the mechanisms by which deep learning and ILP have been connected in the research literature from 2018 to 2026. The scope is deliberately focused on the modern era in which deep learning became the dominant neural paradigm and ILP experienced a renaissance driven by improved hardware, new theoretical ideas, and growing recognition that purely connectionist approaches are insufficient for scientific reasoning [8].

A. Propositionalisation: Relational Logic as Neural Feature Space

The earliest and most conceptually straightforward strategy for connecting ILP with deep learning treats ILP as a feature constructor, operating in two decoupled stages. In the symbolic stage, relational data and background knowledge are fed into an ILP engine that induces first-order clauses, which are then propositionalised into fixed-length Boolean or real-valued feature vectors. In the neural stage, a standard deep neural network trains on these vectors; crucially, no gradients flow back to revise the symbolic features, making the two stages strictly sequential (Figure 1). This approach has a long history in ILP but acquired new significance when combined with deep learning's capacity for hierarchical representation learning.

Figure 1: ILP induces first-order clauses from relational data, which are converted into fixed-length feature vectors for standard deep learning. The two stages are decoupled (n.b. no gradients flow back to revise the symbolic features).

Deep Relational Machines (DRMs), introduced by Lodhi [9], instantiate this pipeline by using an ILP engine to induce first-order Horn clauses whose body literals become Boolean features; successive layers of Restricted Boltzmann Machines then learn hierarchical representations over these features. Dash, Srinivasan, Vig, Orhobor, and King [10] conducted the first large-scale evaluation of DRMs, testing across over 50 QSAR datasets from ChEMBL with industrial-strength background knowledge comprising approximately 100 predicates. A key finding was that stochastic random feature selection—sampling relational features without an ILP engine for guided selection—achieved comparable predictive performance to carefully ILP-selected features, a surprising result that partially resolves the combinatorial explosion problem inherent in propositionalisation. Dash, Srinivasan, Joshi, and Baskar [11] formalised this further through a discrete stochastic search framework with an optimal sampling distribution minimising expected feature search misses. Srinivasan and Vig [12] extended DRMs with logical explanations using Bayes-like relevance proxies, demonstrating that DRMs with randomised propositionalisation achieve state-of-the-art ILP benchmark performance whilst yielding interpretable symbolic explanations—a property of direct relevance to scientific discovery systems requiring human-verifiable hypotheses.

Bottom Clause Propositionalisation (BCP), introduced by França, Zaverucha, and Garcez [13], offers a complementary approach. BCP converts bottom clauses—the most specific clauses in an ILP system's hypothesis space—into binary vectors where each dimension indicates the presence or absence of a particular literal. Their CILP++ system integrated BCP with the C-IL²P neural-symbolic framework, achieving accuracy comparable to Aleph whilst running significantly faster. BCP naturally bounds feature dimensionality through ILP system parameters (clause length, variable depth), offering a principled partial solution to the feature explosion problem that plagues unrestricted propositionalisation. Lavrač et al. [14] articulated the deeper theoretical connection, arguing that propositionalisation and neural embeddings are fundamentally two representations of the same underlying transformation from relational to vectorial space.

For Genesis-AI, propositionalisation offers an immediately actionable integration path: ILP can generate relational features encoding pathway consistency, reaction motifs, and regulatory patterns from the yeast systems biology model, which then feed into neural networks trained on the high-throughput metabolomics and transcriptomics data. However, the approach is inherently two-stage, and the feature space is fixed once generated, introducing latency incompatible with real-time closed-loop operation and precluding the dynamic hypothesis revision that Genesis requires as new experimental data arrives continuously. These limitations motivate the differentiable approaches discussed in the following section.

B. Differentiable ILP: Making Logical Inference Amenable to Gradient Descent

The most transformative development in DL+ILP integration has been making logical inference differentiable, eliminating the two-stage bottleneck of propositionalisation entirely. In this paradigm, training data passes through a neural network that outputs continuous truth values (soft atoms in [0,1]) for logical atoms, which feed into a differentiable inference engine performing forward chaining or semiring-based reasoning. The inference engine evaluates these soft atoms against rule templates with learnable weights, and a loss function computed over the output drives gradients back through two paths simultaneously: one updating the neural network parameters, the other updating the rule weights (Figure 2). This joint optimisation of neural perception and logical hypothesis generation within a single computational graph is what distinguishes differentiable ILP from the decoupled pipeline of propositionalisation.

Figure 2: Neural networks produce continuous truth values for logical atoms, which feed into a differentiable inference engine. Gradients flow back through the symbolic layer (dashed lines), jointly updating both rule weights and neural parameters, therefore eliminating the two-stage bottleneck.

The foundational system in this space is ∂ILP by Evans and Grefenstette [15], which recasts ILP as a satisfiability problem with continuous semantics: ground atoms are mapped to valuations in [0,1], rule templates define the search space, and learnable softmax weights select clause instantiations. Forward chaining inference is implemented via differentiable tensor operations, and the system learns by backpropagation against cross-entropy loss. ∂ILP handles up to 20% label noise—unlike classical ILP, which typically fails with any mislabelled data—and can be hybridised with pre-trained convolutional neural networks for perceptual input. Its limitations include strong language bias requirements (predefined rule templates, at most binary predicates with two body atoms), scalability challenges from propositionalization of all clause instantiations, and memory-intensive tensor-based reasoning.

DeepProbLog by Manhaeve, Dumančić, Kimmig, Demeester, and De Raedt [16, 17] takes a probabilistic approach, extending ProbLog with neural annotated disjunctions: neural predicates map network inputs to probability distributions over logical atoms. Inference proceeds by grounding the program, instantiating neural predicates via softmax outputs, then using ProbLog's algebraic model counting to compute query probabilities. Gradients flow from the probabilistic loss through the semiring computation back to network parameters. The canonical demonstration—learning digit classification from only addition supervision over MNIST image pairs—achieves 96.5% accuracy and exemplifies how logical structure can provide indirect supervision for perception. DeepProbLog supports combined symbolic and subsymbolic reasoning, program induction via sketches, and probabilistic logic programming, but grounding can become prohibitively expensive for large domains such as genome-scale metabolic models.

NeurASP by Yang, Ishay, and Lee [18] takes a parallel approach using Answer Set Programming: neural network outputs become probabilistic facts in an ASP program, and the loss is the negative log-likelihood of observations under stable model semantics. The ASP solver Clingo enumerates stable models while neural gradients propagate through the probabilistic computation. NeurASP demonstrates that symbolic constraints (e.g., Sudoku rules) can significantly improve neural perception accuracy, but scalability is limited by the potentially exponential enumeration of stable models. Neural Logic Machines (NLM) by Dong, Mao, Lin, Wang, Li, and Zhou [19] take a purely neural approach, representing logic predicates as probabilistic tensors and implementing logic rules as neural operators with reduce/expand operations approximating quantifiers via max/min pooling. NLMs achieve perfect accuracy on family tree reasoning and sorting tasks and support reinforcement learning, but provide no explicit symbolic representation of learned rules.

Several systems have advanced beyond ∂ILP's initial limitations. Payani and Fekri [20] introduced dNL-ILP with specialised conjunctive/disjunctive neurons that directly learn Boolean functions, eliminating the need for restrictive rule templates. Shindo, Nishino, and Yamamoto [21] extended ∂ILP to handle structured examples with function symbols. Shindo, Pfanschilling, Dhami, and Kersting developed αILP [22], the first end-to-end differentiable ILP system for complex visual scenes, combining object-centric perception with beam-search-based clause discovery. Their subsequent system NEUMANN [23] compiles first-order logic programs into graph neural networks for message-passing-based reasoning, reducing memory complexity from exponential to linear, a critical advance for scalability. Gao, Inoue, Cao, and Wang [24] proposed DFORL, which searches for interpretable matrix representations of logic programs requiring only the number of variables as language bias. Most recently, the NeurRL system [25] eliminates candidate clause generation entirely, using autoencoders and differentiable clustering to learn symbolic rules from raw sequences.

For Genesis-AI, differentiable ILP offers the tantalising possibility of jointly training neural components (processing raw mass spectrometry and sequencing data) and symbolic components (learning metabolic rules and pathway hypotheses) in a single end-to-end framework. However, scalability remains a concern: yeast metabolism involves 4,058 reactions across eight cellular compartments [6], a scale that exceeds what ∂ILP and its direct descendants can handle. NEUMANN's graph neural network compilation [23] and Scallop's provenance semiring framework [26] represent the most promising paths toward tractable differentiable reasoning at this scale, but neither has been demonstrated on problems of comparable dimensionality. The transition from differentiable toy domains (family trees, visual puzzles) to genome-scale biological reasoning remains an open engineering challenge.

C. Neuro-Symbolic Architectures: A Taxonomy of Integration Patterns

Beyond specifically ILP-oriented systems, a broader ecosystem of neuro-symbolic architectures provides frameworks for integrating neural and symbolic components at different levels of abstraction. One pattern of particular relevance to Genesis-AI is constraint injection, where domain knowledge expressed as first-order logic, stoichiometric rules, or conservation laws is compiled into a differentiable constraint loss. This constraint loss is combined with the standard data-fitting loss into a single objective, and the gradients from this combined loss update the neural network so that it simultaneously fits experimental data and satisfies known scientific constraints (Figure 3). These architectures define the design space within which any practical integration for Genesis-AI must operate.

Figure 3: Symbolic domain knowledge (first-order logic, stoichiometric rules, conservation laws) is compiled into a differentiable constraint loss, which is added to the standard data-fitting loss. The neural network simultaneously fits data and satisfies known scientific constraints.

Logic Tensor Networks (LTN) by Badreddine, Garcez, Serafini, and Spranger [27] ground first-order logic in what they term Real Logic: constants become real-valued tensors, predicates become neural networks outputting fuzzy truth values in [0,1], connectives are approximated by differentiable t-norms, and quantifiers are implemented via aggregation operators. The learning objective is best satisfiability, maximising the truth value of an entire knowledge base across classification, clustering, relational learning, and query answering within a single formalism. Neural Theorem Provers (NTP) by Rocktäschel and Riedel [28] implement differentiable backward chaining by replacing symbolic unification with a radial basis function kernel over dense vector representations, enabling gradient-based learning of both rule weights and symbol representations. Minervini et al. [29] introduced Conditional Theorem Provers that use learned reformulator modules to generate relevant rules on-the-fly, addressing the exponential complexity of enumerating all proof paths.

Lifted Relational Neural Networks (LRNN) by Šourek, Aschenbrenner, Železný, and Kuželka [30, 31] define neural networks as sets of weighted first-order definite clauses, grounded into example-specific feed-forward networks with shared ("lifted") weights across all instances of the same rule. The 2021 paper [31] proves that LRNNs subsume standard graph neural network architectures (GCN, GAT, GIN), meaning any GNN can be expressed as a compact parameterised Datalog program; for Genesis-AI, this implies that metabolic graph representations could be enriched with logical structure without sacrificing GNN scalability. DeepStochLog by Winters, Marra, Manhaeve, and De Raedt [32] uses stochastic definite clause grammars rather than ProbLog's distribution semantics, enabling SLG resolution for polynomial-time inference, orders of magnitude faster than DeepProbLog's weighted model counting. Scallop by Li, Huang, and Naik [26] introduces the provenance semiring framework with 18 built-in provenance strategies, achieving superior runtime efficiency across eight neurosymbolic benchmarks.

The taxonomy of integration points has been systematised across several surveys. Dash's thesis [7] identifies three principal avenues: changing the input representation (propositionalisation), changing the loss function (constraint injection), and changing the model architecture. Kautz's six-type taxonomy [33] refines this further, with Type 5 (tensorised logic, e.g., LTN, ∂ILP) and Type 6 (symbolic reasoning embedded inside neural engines) being most relevant to DL+ILP integration. Von Rueden et al. [34] provide the broadest treatment, organising approaches along three dimensions: knowledge source, representation, and integration point in the machine learning pipeline.

The constraint injection pattern illustrated in Figure 3 has several concrete realisations. The semantic loss function by Xu, Zhang, Friedman, Liang, and Van den Broeck [35] derives from first principles a loss term equating the negative log-probability of generating a satisfying assignment when sampling according to neural output probabilities. DL2 by Fischer, Balunović, Drachsler-Cohen, Gehr, Zhang, and Vechev [36] translates logical constraints over numerical values into differentiable loss functions with a soundness guarantee. Hu, Ma, Liu, Hovy, and Xing [37] proposed an iterative knowledge distillation framework where a rule-regularised "teacher" network transfers knowledge from first-order logic rules to neural network weights via the posterior regularisation principle [38].

The Physics-Informed Neural Networks (PINNs) paradigm by Raissi, Perdikaris, and Karniadakis [39] provides the most directly relevant analogy for scientific constraint injection: partial differential equation residuals computed via automatic differentiation are added as loss terms, encoding physical laws as soft constraints. Beucler et al. [40] enforced conservation of energy and mass in atmospheric convection models through both hard constraints (architectural enforcement achieving machine precision) and soft constraints (loss penalties). Sturm and Wexler [41] embedded stoichiometric information directly into a neural architecture via a constraint layer with non-optimisable weights representing the stoichiometry matrix, guaranteeing atom conservation in every prediction. Hansen et al. [42] formalised three enforcement approaches: PINNs (soft), neural operators (implicit), and hard-constrained conservative models.

For Genesis-AI, these methods offer a principled path for encoding known metabolic constraints, including mass balance, thermodynamic feasibility, and stoichiometric consistency, directly into the neural components, reserving data-driven learning for what is genuinely unknown.

C.1. The Dash Framework: Systematically Mapping Symbolic Knowledge to Neural Constraints

Dash's thesis [7] ("Inclusion of Symbolic Domain-Knowledge into Deep Neural Networks," BITS Pilani, 2022, supervised by Srinivasan) merits dedicated treatment because it provides the most systematic account of how ILP-derived knowledge maps to neural learning mechanisms—precisely the conceptual bridge that the Genesis-AI proposal invokes but does not elaborate.

The framework identifies the central problem as one of knowledge injection: given symbolic domain knowledge expressed as logical rules, relations, or constraints, how does one translate this knowledge into a form that modifies the behaviour of a deep neural network in a principled way? Dash structures the answer around the observation that there is no single mechanism for this translation; rather, there exist multiple injection points into the neural learning pipeline, and the appropriate choice depends on the nature of the knowledge, the properties of the data, and the requirements of the application [43].

The thesis contributes four specific techniques. First, utility-based stochastic sampling for DRM features [10, 11], which resolves the feature explosion problem by demonstrating that carefully randomised feature sampling from the ILP hypothesis space achieves comparable performance to exhaustive search. Second, Vertex-Enriched Graph Neural Networks (VEGNNs) that augment vertex labels with domain-derived relational properties. Third, Bottom-Graph Neural Networks (BotGNNs) [44] that use ILP's mode-directed inverse entailment to construct enriched graph representations for GNNs. Fourth, a modular neuro-symbolic system for drug design combining deep generative models with ILP-derived discriminators. BotGNNs, tested across over 80 datasets [44], represent the most mature instantiation: ILP bottom clauses provide the graph structure and vertex enrichment that standard molecular graph representations lack, yielding consistent improvements in predictive accuracy for QSAR modelling.

The companion review paper by Dash, Chitlangia, Ahuja, and Srinivasan [43] surveys these techniques more broadly, demonstrating substantial empirical evidence across approximately 75 drug discovery datasets with roughly 200,000 relational instances that ILP-informed domain knowledge significantly improves neural network performance. For Genesis-AI, the Dash framework provides the architectural vocabulary—input-level injection via propositionalisation, architecture-level injection via enriched graph construction, loss-function-level injection via constraint terms—but the framework is fundamentally static and offline. It assumes a train-once-deploy paradigm that cannot accommodate the continuous stream of new experimental data that Genesis will generate. Extending the Dash framework to support online, incremental knowledge injection remains an open challenge.

D. Online and Incremental ILP: Approaching Real-Time Operation

Genesis-AI's requirement for real-time hypothesis generation from streaming experimental data makes online and incremental ILP a critical enabler. Classical ILP is batch-mode: collect data, run learning, obtain rules. A system generating 10,000 mass spectrometry measurements per day cannot afford to halt and recompute from scratch each time new data arrives. The alternative is incremental learning. Data arrives in streaming windows; only the first window requires full theory induction. Each subsequent window triggers an incremental update that revises the current theory in place, producing successive versions (v1, v2, v3, ... vn) while an accumulating constraint store prunes the hypothesis space, enabling near-constant update time per window (Figure 4).

Figure 4: Data arrives in streaming windows. Only the first window requires full learning; subsequent windows trigger incremental updates that revise the current theory. An accumulating constraint store prunes the hypothesis space without recomputation, enabling near-constant update time per window.

IncrementalLAS by Law, Russo, and Broda [45] realises exactly this pattern, using hypothesis space expansion to update OPT-sufficient subsets when new examples arrive. It provably preserves optimality guarantees whilst achieving nearly constant time per window. IncrementalLAS builds on the ILASP family: ILASP itself [46], its conflict-driven variant CDILP [47] inspired by clause learning in SAT solvers, and FastLAS [48] which introduced customisable scoring functions for scalable learning.

For true single-pass streaming, OLED by Katzouris, Artikis, and Paliouras [49] uses the Hoeffding bound to evaluate clauses on subsets of the input stream, never revisiting past data. Its parallel extension [50] achieves super-linear speedups on activity recognition tasks. WOLED [51] extends this to weighted answer set rules with AdaGrad-based updates, and the full probabilistic version [52] combines online structure and weight learning.

Popper by Cropper and Morel [53] takes a different angle. Though designed for batch operation, its Learning from Failures (LFF) paradigm decomposes ILP into generate-test-constrain stages whose constraint accumulation is naturally compatible with incremental operation: new data yields new constraints that augment the existing database without full recomputation. Hocquette, Schmid, and Cropper [54] extended Popper with fine-grained SLD-tree failure analysis, dramatically reducing hypothesis space exploration, and Cropper et al. [55] introduced symmetry breaking that reduces solving times from over one hour to 17 seconds in some cases.

For Genesis-AI, the convergence of IncrementalLAS's provably optimal streaming updates [45] with Popper's efficient constraint-driven search [53] and OLED's Hoeffding-bounded online evaluation [49] suggests that real-time ILP is approaching feasibility, at least for the symbolic reasoning component considered in isolation. The unsolved problem is integrating these online ILP mechanisms with differentiable neural components in a single closed-loop system where both the symbolic rules and the neural parameters update continuously from shared experimental data streams.

E. Abductive Reasoning: Bridging Neural Perception and Logical Hypothesis Generation

Abduction, i.e., the best explanation for observations, is the reasoning mode most central to scientific discovery. When the yeast systems biology model predicts growth under certain conditions but the micro-bioreactors observe no growth, the system must hypothesise what is missing: a previously unknown enzyme, an unmodelled regulatory interaction, a misspecified kinetic parameter. The machinery for this differs fundamentally from induction and deduction. A neural network first maps raw training data to pseudo-labels; a logic reasoning module then checks these pseudo-labels against domain knowledge (first-order logic, stoichiometric rules, conservation laws). If consistent, the labels are accepted. If not, abductive revision hypothesises missing facts and revises the labels, and the neural network retrains on these corrected labels, closing the loop (Figure 5). This cycle directly mirrors the hypothesis-experiment-revision loop that defines autonomous scientific discovery.

Figure 5: A neural perception module maps raw data to pseudo-labels, which a logic module checks against background knowledge. Consistent labels are accepted; inconsistencies trigger abductive revision (hypothesising missing facts and revising labels) which retrains the neural module. This cycle mirrors the hypothesis-experiment-revision loop in autonomous scientific discovery.

The Abductive Learning (ABL) framework by Dai, Xu, Yu, and Zhou [56] formalises exactly this architecture: a convolutional neural network produces pseudo-labels, a Prolog-based reasoning module checks consistency against background knowledge, and abduction selectively revises pseudo-labels to retrain the perception model. Zhou [57] provided the theoretical foundation, arguing that abduction is the missing link between statistical learning and logical reasoning, a position that resonates with the Genesis-AI proposal's emphasis on hypothesis generation as distinct from both pattern recognition and deductive verification.

MetaAbd by Dai and Muggleton [58] represents the most complete integration of neural learning, abductive reasoning, and ILP. It combines neural perception (mapping raw data to probabilistic symbols), meta-interpretive learning (inducing first-order theories with predicate invention and recursion via Metagol), and logical abduction (pruning the symbol-value search space). MetaAbd is the first system capable of jointly learning neural networks from scratch while inducing recursive first-order logic theories with predicate invention, a capability directly relevant to Genesis-AI's need for generating novel biological hypotheses from raw experimental data. Dai, Hallett, Muggleton, and Baldwin [59] applied MetaAbd to the Design-Build-Test-Learn cycle in synthetic biology, demonstrating applicability to biological domains.

The framework has evolved rapidly. ABL-Refl by Hu, Dai, Jiang, and Zhou [60] (AAAI 2025 Outstanding Paper Award) abduces a "reflection vector" flagging potential errors in neural outputs, invoking abduction only when needed during inference, a critical efficiency innovation for high-throughput settings. ABLkit [61] provides an open-source Python toolkit for deploying abductive learning systems.

On the theorem proving side, Chvalovský, Korovin, Piepenbrock, and Urban [62] extended iProver, the instantiation-based first-order theorem prover used by Genesis-AI for deductive inference on yeast metabolic models, with graph neural network-based clause scoring, doubling the number of problems solved compared to human-programmed heuristics. Genesis uses iProver to reason about growth/no-growth predictions and propose candidate hypotheses for model improvement [6]; integrating neural guidance into the prover exemplifies the broader trend of using deep learning not to replace symbolic reasoning but to make it tractably efficient at scale.

Discussion: Toward Real-Time Neuro-Symbolic Scientific Reasoning in Genesis-AI

The synthesis across these five thematic axes reveals both the richness of the available integration mechanisms and the distance that remains before Genesis-AI's requirements can be met. What would a complete architecture look like? Raw experimental data from approximately 10,000 micro-bioreactors (mass spectrometry, transcriptomics, growth measurements) first passes through a neural perception layer that converts raw signals into structured observations. These observations enter a unified neuro-symbolic engine composed of four mechanisms reviewed in this work: differentiable reasoning (Scallop-style inference) and abductive hypothesis generation (MetaAbd-style) operate in parallel, feeding into incremental theory revision (IncrementalLAS-style streaming updates), all governed by a constraint layer enforcing mass balance, stoichiometry, and thermodynamic laws. The engine outputs interpretable hypotheses and experiment proposals, which feed back to the bioreactors, closing the autonomous discovery cycle (Figure 6).

Figure 6: Raw experimental data from ~10,000 micro-bioreactors passes through a neural perception layer into a unified neuro-symbolic engine. The engine composes four mechanisms (differentiable reasoning, abductive hypothesis generation, incremental theory revision, and hard scientific constraint enforcement) into a single closed loop. Interpretable hypotheses and experiment proposals feed back to the bioreactors, closing the autonomous discovery cycle.

No existing system achieves this. But the components are maturing along four clear trajectories.

First, the field has moved decisively from two-stage propositionalisation toward end-to-end differentiable reasoning. Scallop [26], NEUMANN [23], and NeurRL [25] represent the frontier of scalable, differentiable neuro-symbolic inference, suggesting that the latency inherent in separate ILP-then-neural pipelines can be eliminated. However, none of these systems has been demonstrated at the scale of genome-scale metabolic models (thousands of reactions, tens of thousands of parameters), and the transition from toy benchmarks to biological complexity remains open.

Second, abductive learning [56, 58] has matured into a principled framework for the hypothesis-abduction-experiment cycle that defines Robot Scientist operation. MetaAbd's ability to jointly learn perception and recursive logical theories from raw data [58] is precisely what Genesis needs when it must propose missing enzymes or unmodelled regulatory interactions from unexpected experimental outcomes. Its application to synthetic biology's Design-Build-Test-Learn cycle [59] provides direct evidence of biological applicability.

Third, Incremental ILP has achieved the real-time feasibility needed for closed-loop experimentation. IncrementalLAS [45] provides optimality-preserving streaming updates, OLED [49] demonstrates single-pass online learning from sensor streams, and Popper's constraint-driven search [53] with symmetry breaking [55] achieves dramatic speedups. The challenge is that these systems operate on symbolic data, whereas Genesis produces raw mass spectrometry traces and sequencing reads that must first pass through neural perception.

Fourth, Constraint injection mechanisms [35, 36, 40, 41] now offer mature methods for encoding stoichiometric, conservation, and other scientific constraints directly into neural training, either as differentiable loss terms or hard architectural guarantees. For yeast metabolism, where mass balance and thermodynamic constraints are known with certainty, this allows the neural components to focus their learning capacity on genuinely uncertain aspects of the model.

The theoretical foundations and component systems are now largely in place. The critical open challenge is composing them into a single engine operating at the scale of 10,000 micro-bioreactors, processing 10,000 mass spectrometry measurements per day and thousands of transcriptomics experiments per year. That integration, at that scale, defines the next frontier of autonomous scientific discovery.

References

[1] P. M. M. Duhem, The Aim and Structure of Physical Theory, vol. 13. Princeton University Press, 1991.

[2] H. Kitano, "Nobel Turing challenge: creating the engine for scientific discovery," npj Systems Biology and Applications, vol. 7, no. 1, p. 29, 2021.

[3] R. D. King et al., "Functional genomic hypothesis generation and experimentation by a robot scientist," Nature, vol. 427, pp. 247–252, 2004.

[4] R. D. King et al., "The automation of science," Science, vol. 324, no. 5923, pp. 85–89, 2009.

[5] K. Williams et al., "Cheaper faster drug development validated by the repositioning of drugs against neglected tropical diseases," J. Royal Society Interface, vol. 12, no. 104, 2015.

[6] R. D. King et al., "Genesis: Towards the automation of systems biology research," arXiv:2408.10689, 2024.

[7] T. Dash, "Inclusion of symbolic domain-knowledge into deep neural networks," Ph.D. Thesis, BITS Pilani, 2022.

[8] A. Cropper, S. Dumančić, R. Evans, and S. H. Muggleton, "Inductive logic programming at 30," Machine Learning, vol. 111, no. 1, pp. 147–172, 2022.

[9] H. Lodhi, "Deep relational machines," in Proc. ICONIP 2013, LNCS vol. 8227, pp. 212–219, Springer, 2013.

[10] T. Dash, A. Srinivasan, L. Vig, O. I. Orhobor, and R. D. King, "Large-scale assessment of deep relational machines," in Proc. ILP 2018, LNCS vol. 11105, pp. 22–37, Springer, 2018.

[11] T. Dash, A. Srinivasan, R. S. Joshi, and A. Baskar, "Discrete stochastic search and its application to feature-selection for deep relational machines," in Proc. ICANN 2019, LNCS vol. 11728, pp. 29–45, Springer, 2019.

[12] A. Srinivasan and L. Vig, "Logical explanations for deep relational machines using relevance information," JMLR, vol. 20, no. 130, pp. 1–47, 2019.

[13] M. V. M. França, G. Zaverucha, and A. S. d'Avila Garcez, "Fast relational learning using bottom clause propositionalization with artificial neural networks," Machine Learning, vol. 94, no. 1, pp. 81–104, 2014.

[14] N. Lavrač et al., "Propositionalization and embeddings: two sides of the same coin," Machine Learning, vol. 109, pp. 1465–1507, 2020.

[15] R. Evans and E. Grefenstette, "Learning explanatory rules through neural satisfiability," JAIR, vol. 61, pp. 1–64, 2018.

[16] R. Manhaeve, S. Dumančić, A. Kimmig, T. Demeester, and L. De Raedt, "DeepProbLog: Neural probabilistic logic programming," in Proc. NeurIPS 2018, pp. 3753–3763.

[17] R. Manhaeve, S. Dumančić, A. Kimmig, T. Demeester, and L. De Raedt, "Neural probabilistic logic programming in DeepProbLog," Artificial Intelligence, vol. 298, 103504, 2021.

[18] Z. Yang, A. Ishay, and J. Lee, "NeurASP: Embracing neural networks into answer set programming," in Proc. IJCAI 2020, pp. 1755–1762.

[19] H. Dong, J. Mao, T. Lin, C. Wang, L. Li, and D. Zhou, "Neural logic machines," in Proc. ICLR 2019.

[20] A. Payani and F. Fekri, "Inductive logic programming via differentiable deep neural logic networks," arXiv:1906.03523, 2019.

[21] H. Shindo, M. Nishino, and A. Yamamoto, "Differentiable inductive logic programming for structured examples," in Proc. AAAI 2021, pp. 5034–5041.

[22] H. Shindo, V. Pfanschilling, D. S. Dhami, and K. Kersting, "αILP: Thinking visual scenes as differentiable logic programs," Machine Learning, vol. 112, pp. 1465–1497, 2023.

[23] H. Shindo, V. Pfanschilling, D. S. Dhami, and K. Kersting, "Learning differentiable logic programs for abstract visual reasoning," Machine Learning, vol. 113, pp. 8533–8584, 2024.

[24] K. Gao, K. Inoue, Y. Cao, and H. Wang, "A differentiable first-order rule learner for inductive logic programming," Artificial Intelligence, vol. 331, 104108, 2024.

[25] "Differentiable rule induction from raw data," in Proc. ICLR 2025.

[26] Z. Li, J. Huang, and M. Naik, "Scallop: A language for neurosymbolic programming," Proc. ACM Program. Lang., vol. 7, no. PLDI, Article 166, 2023.

[27] S. Badreddine, A. d'Avila Garcez, L. Serafini, and M. Spranger, "Logic tensor networks," Artificial Intelligence, vol. 303, 103649, 2022.

[28] T. Rocktäschel and S. Riedel, "End-to-end differentiable proving," in Proc. NeurIPS 2017, pp. 3788–3800.

[29] P. Minervini, S. Riedel, P. Stenetorp, E. Grefenstette, and T. Rocktäschel, "Learning reasoning strategies in end-to-end differentiable proving," in Proc. ICML 2020, PMLR vol. 119, pp. 6938–6949.

[30] G. Šourek, V. Aschenbrenner, F. Železný, S. Schockaert, and O. Kuželka, "Lifted relational neural networks: Efficient learning of latent relational structures," JAIR, vol. 62, pp. 69–100, 2018.

[31] G. Šourek, F. Železný, and O. Kuželka, "Beyond graph neural networks with lifted relational neural networks," Machine Learning, vol. 110, no. 7, pp. 1695–1738, 2021.

[32] T. Winters, G. Marra, R. Manhaeve, and L. De Raedt, "DeepStochLog: Neural stochastic logic programming," in Proc. AAAI 2022, vol. 36, no. 9, pp. 10090–10100.

[33] H. Kautz, "The third AI summer: AAAI Robert S. Engelmore Memorial Lecture," AI Magazine, vol. 43, no. 1, pp. 105–125, 2022.

[34] L. von Rueden et al., "Informed machine learning — A taxonomy and survey of integrating prior knowledge into learning systems," IEEE TKDE, vol. 35, no. 1, pp. 614–633, 2023.

[35] J. Xu, Z. Zhang, T. Friedman, Y. Liang, and G. Van den Broeck, "A semantic loss function for deep learning with symbolic knowledge," in Proc. ICML 2018, PMLR 80, pp. 5502–5511.

[36] M. Fischer, M. Balunović, D. Drachsler-Cohen, T. Gehr, C. Zhang, and M. Vechev, "DL2: Training and querying neural networks with logic," in Proc. ICML 2019.

[37] Z. Hu, X. Ma, Z. Liu, E. Hovy, and E. Xing, "Harnessing deep neural networks with logic rules," in Proc. ACL 2016, pp. 2410–2420.

[38] K. Ganchev, J. Graça, J. Gillenwater, and B. Taskar, "Posterior regularization for structured latent variable models," JMLR, vol. 11, pp. 2001–2049, 2010.

[39] M. Raissi, P. Perdikaris, and G. E. Karniadakis, "Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations," Journal of Computational Physics, vol. 378, pp. 686–707, 2019.

[40] T. Beucler, M. Pritchard, S. Rasp, J. Ott, P. Baldi, and P. Gentine, "Enforcing analytic constraints in neural networks emulating physical systems," Physical Review Letters, vol. 126, 098302, 2021.

[41] P. O. Sturm and A. S. Wexler, "Conservation laws in a neural network architecture: Enforcing the atom balance of a Julia-based photochemical model," Geoscientific Model Development, vol. 15, pp. 3417–3431, 2022.

[42] N. Hansen et al., "Learning physical models that can respect conservation laws," in Proc. ICML 2023, PMLR vol. 202.

[43] T. Dash, S. Chitlangia, A. Ahuja, and A. Srinivasan, "A review of some techniques for inclusion of domain-knowledge into deep neural networks," Scientific Reports, vol. 12, no. 1, 1040, 2022.

[44] T. Dash, A. Srinivasan, and A. Baskar, "Inclusion of domain-knowledge into GNNs using mode-directed inverse entailment," Machine Learning, vol. 111, no. 2, pp. 575–623, 2022.

[45] M. Law, A. Russo, and K. Broda, "Search space expansion for efficient incremental inductive logic programming," in Proc. IJCAI 2022, pp. 2654–2660.

[46] M. Law, A. Russo, and K. Broda, "Inductive learning of answer set programs," in Proc. JELIA 2014, LNCS 8761, pp. 311–325, 2014.

[47] M. Law, "Conflict-driven inductive logic programming," Theory and Practice of Logic Programming, 2022.

[48] M. Law, A. Russo, E. Bertino, K. Broda, and J. Lobo, "FastLAS: Scalable inductive logic programming incorporating domain-specific optimisation criteria," in Proc. AAAI 2020, vol. 34, no. 3, pp. 2877–2885.

[49] N. Katzouris, A. Artikis, and G. Paliouras, "Online learning of event definitions," Theory and Practice of Logic Programming, vol. 16, no. 5-6, pp. 817–833, 2016.

[50] N. Katzouris, A. Artikis, and G. Paliouras, "Parallel online learning of event definitions," in Proc. ILP 2017, LNCS vol. 10759, pp. 78–93, 2018.

[51] N. Katzouris and A. Artikis, "WOLED: A tool for online learning weighted answer set rules for temporal reasoning under uncertainty," in Proc. KR 2020.

[52] N. Katzouris, G. Paliouras, and A. Artikis, "Online learning probabilistic event calculus theories in answer set programming," TPLP, vol. 23, no. 2, pp. 362–386, 2023.

[53] A. Cropper and R. Morel, "Learning programs by learning from failures," Machine Learning, vol. 110, no. 4, pp. 801–856, 2021.

[54] C. Hocquette, U. Schmid, and A. Cropper, "Learning logic programs by explaining their failures," Machine Learning, 2024.

[55] A. Cropper, C. Hocquette et al., "Symmetry breaking for inductive logic programming," arXiv:2508.06263, 2025.

[56] W.-Z. Dai, Q.-L. Xu, Y. Yu, and Z.-H. Zhou, "Bridging machine learning and logical reasoning by abductive learning," in Proc. NeurIPS 2019.

[57] Z.-H. Zhou, "Abductive learning: Towards bridging machine learning and logical reasoning," Science China Information Sciences, vol. 62, 076101, 2019.

[58] W.-Z. Dai and S. H. Muggleton, "Abductive knowledge induction from raw data," in Proc. IJCAI 2021, pp. 1813–1820.

[59] W.-Z. Dai, R. Hallett, S. H. Muggleton, and G. S. Baldwin, "Automated biodesign engineering by abductive meta-interpretive learning," arXiv:2105.07758, 2021.

[60] W. Hu, W.-Z. Dai, Z. Jiang, and Z.-H. Zhou, "Efficient rectification of neuro-symbolic reasoning inconsistencies by abductive reflection," in Proc. AAAI 2025, pp. 17333–17341.

[61] Y. Huang et al., "ABLkit: A Python toolkit for abductive learning," Frontiers in AI, 2024.

[62] K. Chvalovský, K. Korovin, J. Piepenbrock, and J. Urban, "Guiding an instantiation prover with graph neural networks," in Proc. LPAR 2023, EPiC Series vol. 94, pp. 112–123.

[63] A. Cropper and S. Dumančić, "Inductive logic programming at 30: A new introduction," JAIR, vol. 74, pp. 765–850, 2022.

[64] G. Marra, S. Dumančić, R. Manhaeve, and L. De Raedt, "From statistical relational to neuro-symbolic artificial intelligence: A survey," Artificial Intelligence, vol. 328, 104062, 2024.

[65] A. d'Avila Garcez and L. C. Lamb, "Neurosymbolic AI: The 3rd wave," Artificial Intelligence Review, vol. 56, pp. 1–20, 2023.

[66] A. d'Avila Garcez, M. Gori, L. C. Lamb, L. Serafini, M. Spranger, and S. N. Tran, "Neural-symbolic computing: An effective methodology for principled integration of machine learning and reasoning," FLAP, vol. 6, no. 4, pp. 611–632, 2019.

[67] T. R. Besold, A. d'Avila Garcez et al. (13 authors), "Neural-symbolic learning and reasoning: A survey and interpretation," in Neuro-Symbolic Artificial Intelligence: The State of the Art, IOS Press, 2022.

[68] P. Hitzler and M. K. Sarker (eds.), Neuro-Symbolic Artificial Intelligence: The State of the Art, IOS Press, Frontiers in AI vol. 342, 2022.

[69] P. Hitzler, M. K. Sarker, and A. Eberhart (eds.), Compendium of Neurosymbolic Artificial Intelligence, IOS Press, vol. 369, 2023.

[70] G. Marcus, "The next decade in AI: Four steps towards robust artificial intelligence," arXiv:2002.06177, 2020.

[71] W. Wang, Y. Yang, and F. Wu, "Towards data-and knowledge-driven AI: A survey on neuro-symbolic computing," IEEE TPAMI, vol. 47, pp. 878–899, 2025.

[72] B. Colelough and W. C. Regli, "Neuro-symbolic AI in 2024: A systematic review," arXiv:2501.05435, 2025.

[73] L. De Smet et al., "Defining neurosymbolic AI," arXiv:2507.11127, 2025.

[74] R. Brazil, "Inside the ‘self-driving’ lab revolution," Nature, vol. 652, pp. 262–264, Mar. 2026, doi: 10.1038/d41586-026-00974-2.