| 1. Domain | 1.1 Scope of the Domain | Boundaries | The range of phenomena the science includes and excludes. | Focuses on how genomes change over evolutionary time and how genome architecture, content, and organization differ across species. Includes mutation, substitution, genome rearrangements, duplications, deletions, transposons, synteny, orthology/paralogy, phylogenomics, and comparative sequence/structure analysis. Excludes individual-level inheritance (Mendelian genetics) and functional regulation unless directly tied to evolutionary or comparative interpretation. |
| | Scale | The spatial, temporal, or organizational level at which the science operates (e.g., quantum, cellular, social, cosmic). | Operates at nucleotide to whole-genome scales; temporal scales from recent divergence to deep evolutionary time (millions–billions of years); organizational levels include genes, gene families, chromosomes, and full genomes. |
| 1.2 Ontological Commitments | Entities | The kinds of things assumed to exist within the domain (particles, organisms, agents, fields, etc.). | Genomes, genes, gene families, orthologs, paralogs, mutations, substitutions, transposable elements, syntenic blocks, structural variants, conserved elements, ancestral genome reconstructions, phylogenetic trees. |
| | Properties | The fundamental attributes these entities possess (mass, charge, genotype, preference, etc.). | Sequence identity, divergence rates, substitution rates (dN/dS), GC content, genome size, gene-copy number, synteny conservation, mutation spectrum, recombination rate, structural-variation frequency, phylogenetic distances. |
| | Categories | The basic ontological types used to classify domain elements (substances, processes, relations, structures). | Mutation classes (point, indel, structural), homology types (orthology, paralogy, xenology), genomic features (coding, noncoding, regulatory, repetitive), rearrangement types (inversions, translocations, fusions), evolutionary models (neutral, nearly neutral, adaptive). |
| 1.3 State-Variables | Variables | The measurable or definable properties that describe system conditions. | Sequence divergence levels, substitution rates, gene family sizes, copy-number variation, TE composition, synteny conservation metrics, phylogenetic branch lengths, mutation rates, recombination variation across the genome. |
| | Parameterization | How variables encode and represent the system’s state. | System encoded by multiple sequence alignments, substitution matrices, phylogenetic models, genome-synteny maps, rate matrices (Q), comparative gene-family models, GC/TE landscapes, and structural-variant matrices. |
| 1.4 Admissible Idealizations | Simplifications | Conceptual reductions used to make the domain tractable (point masses, rational agents, perfect gases). | Assuming constant mutation rates, independent sites, homogeneous substitution processes, clock-like evolution, ignoring complex rearrangement histories, treating large genomes as collections of independent loci, ignoring epistasis or context-dependent mutation. |
| | Validity Conditions | The limits and contexts in which idealizations hold or break down. | Breaks down under rate heterogeneity, episodic selection, varying recombination landscapes, genome structural bursts, horizontal gene transfer, rapid genome-size shifts, or when substitution processes strongly deviate from simple models. |
| 1.5 Domain Assumptions | Structural Assumptions | Background ontological stances such as determinism, continuity, randomness, discreteness. | Sequence similarity reflects evolutionary relatedness; substitution processes are statistically modelable; phylogenies can be inferred; genome organization retains detectable patterns of homology; gene-family evolution follows definable birth–death processes. |
| | Implicit Commitments | Unstated but necessary assumptions that shape the field’s conceptual structure. | Assumes sufficient conservation to infer homology, stable phylogenetic signal, reasonably accurate genome assemblies, and that stochastic mutation models approximate real evolutionary mechanisms at broad scales. |
| 1.6 Internal Coherence Requirements | Consistency | The demand that domain concepts do not contradict one another. | Homology assignments, phylogenetic trees, substitution models, and synteny patterns must not contradict each other; inferred evolutionary histories must align with genomic data and comparative structure. |
| | Compatibility | The requirement that entities, variables, and assumptions fit together into a unified descriptive framework. | Genome sequences, substitution rates, structural-variation data, gene-family dynamics, and phylogenetic models must integrate into one coherent framework describing genome evolution over time. |
| 2. Evidence Layer | 2.1 Observable Phenomena | Observables | The aspects of the domain that can produce detectable signals accessible to measurement. | Sequence divergence patterns, conserved motifs, ortholog/paralog relationships, gene-family expansions/contractions, genome rearrangements (inversions, fusions, fissions), synteny conservation, substitution-rate variation, TE insertions, phylogenetic branching patterns. |
| | Detection Limits | The boundaries of what can be resolved or sensed by current instruments or methods. | Limited by sequencing resolution, assembly accuracy, ability to detect structural variants, difficulty resolving repeats, low signal in highly diverged genomes, and reduced reliability of orthology detection at deep evolutionary distances. |
| 2.2 Measurement Systems | Units | Standardized quantifications (meters, seconds, volts, decibels, dollars, etc.) necessary for consistent comparison. | Sequence identity (%), substitution rates (per site per generation), dN/dS ratios, GC content (%), copy-number counts, synteny block lengths, branch lengths in substitutions/site, genome-size (bp), TE frequency. |
| | Instruments | Devices and tools (microscopes, spectrometers, sensors, surveys, detectors) used to produce measurements. | Sequencing platforms (Illumina, PacBio, Oxford Nanopore), genome assemblers, multiple sequence aligners, phylogenetic inference tools (RAxML, IQ-TREE), synteny detection software, structural-variant callers, comparative annotation pipelines. |
| 2.3 Operational Definitions | Definitions | Terms defined by specific measurement procedures, ensuring empirical clarity. | Orthologs defined as genes diverged by speciation; paralogs defined as genes diverged by duplication; substitution rate defined by changes per site; synteny defined by conserved gene order; dN/dS defined by nonsynonymous vs synonymous substitution ratios. |
| | Procedures | The explicit steps required to perform a measurement in a reproducible way. | Sequence alignment, homology searches, phylogenetic tree inference, genome assembly polishing, synteny mapping, variant calling, mutation-rate estimation, annotation transfer, comparative motif identification. |
| 2.4 Data Acquisition | Protocols | Formal processes for gathering data under controlled or standardized conditions. | Standardized genome sequencing workflows, consistent depth-of-coverage targets, replicates for sequencing accuracy, controlled assembly pipelines, uniform alignment settings, validated orthology/homology procedures. |
| | Sampling | Rules determining which subset of the domain is measured and how representative it is. | Choosing representative taxa across clades, balanced sampling across phylogenetic distances, ensuring outgroup inclusion, avoiding biases from incomplete genomes, sampling multiple individuals for polymorphism-aware models. |
| 2.5 Data Character & Format | Data Types | The form raw evidence takes (time series, spectra, images, counts, qualitative records). | Genome assemblies, multiple-sequence alignments, phylogenetic trees, synteny maps, structural-variant lists, mutation spectra tables, gene-family size matrices, substitution-rate matrices, TE annotation tracks. |
| | Resolution | The granularity or precision with which data is captured. | Determined by sequencing technology (read length, accuracy), assembly contiguity, alignment precision, phylogenetic resolution at deep vs shallow divergences, and sensitivity to detecting structural variants and repeats. |
| 2.6 Reliability & Calibration | Calibration | Adjustment procedures ensuring instruments produce accurate results. | Sequencer error-rate calibration, assembly benchmarking, alignment quality scoring, substitution-model fit testing, orthology validation, reference-based correction, biological replication for mutation-rate estimates. |
| | Error Characterization | Identification and quantification of noise, uncertainty, bias, and measurement error. | Identification of sequencing and assembly errors, misaligned regions, false orthology calls, unresolved repeats, saturation effects in highly diverged sequences, phylogenetic model misfit, and quantification of systematic vs random errors in variant detection. |
| 3. Structural Layer | 3.1 Patterns & Regularities | Laws / Relations | Stable, repeatable patterns governing how observables behave across conditions. | Substitution rates accumulate in predictable patterns; conserved synteny reflects shared ancestry; gene duplication followed by divergence yields paralogs; orthologs retain function more consistently than paralogs; repetitive elements expand and contract in lineage-specific patterns; structural variants follow recurring, identifiable rearrangement motifs. |
| | Invariants | Quantities or properties that remain constant under transformations (symmetries, conservation laws). | Conserved protein domains across deep evolutionary time; persistent synteny blocks; stable substitution biases (e.g., transition/transversion ratios); invariant phylogenetic branching order once resolved; conserved core gene sets across major clades. |
| 3.2 Causal Architecture | Mechanisms | Underlying processes or structures that produce the observed regularities. | Mutation introduces variation; recombination reshapes genomes; drift fixes or eliminates changes in small populations; selection preserves or removes variants; duplication and neofunctionalization expand gene families; transposons mobilize and remodel genome structure; rearrangements alter chromosomal organization. |
| | Pathways | Organized sequences of interactions forming a causal chain or network. | Mutation → substitution → divergence; duplication → divergence → gene-family expansion; recombination → reshuffled haplotypes → LD changes; TE insertion → structural remodeling → new regulatory or functional outcomes; speciation → lineage-specific genome trajectories. |
| 3.3 Theoretical Vocabulary | Concepts | Core terms that encode the domain’s structure (force, gene, equilibrium, field). | Orthology, paralogy, homology, synteny, substitution rate, dN/dS, mutation spectrum, recombination landscape, structural variation, genome duplication, phylogeny, molecular clock, conserved elements, evolutionary constraint. |
| | Classifications | Taxonomies, categories, or typologies that organize entities and relations. | Mutation classes (point, indel, SV), homology categories (ortholog, paralog, xenolog), rearrangement types (inversion, translocation, fusion, fission), evolutionary models (neutral, nearly neutral, adaptive), genomic feature types (coding, regulatory, repetitive). |
| 3.4 Formal Representations | Equations | Mathematical constructs expressing laws, relations, or mechanisms. | Substitution models (Jukes–Cantor, Kimura, GTR); dN/dS ratio calculations; molecular-clock equations; birth–death models for gene families; rate matrices for phylogenetics; synteny conservation metrics; recombination and mutation-rate equations. |
| | Models | Structured representations—mathematical, computational, or conceptual—used to predict and explain phenomena. | Phylogenetic models, gene-family birth–death models, molecular-clock models, genome rearrangement models, coalescent-based comparative models, TE-dynamics models, ancestral genome reconstruction frameworks. |
| 3.5 Idealized Structures | Simplified Models | Purposeful abstractions that capture essential dynamics while omitting irrelevant detail. | Independent-site models; constant substitution rates; homogeneous evolutionary processes; simplified rearrangement histories; ignoring epistasis; treating duplicated genes as evolving independently; assuming clock-like evolution across entire lineages. |
| | Limit Conditions | Regimes where specific models or approximations hold (classical vs. quantum, linear vs. nonlinear). | Fail under strong rate heterogeneity, lineage-specific bursts of evolution, horizontal gene transfer, genome structural upheavals, deep-time saturation, or high TE proliferation; clock assumptions break when selection or mutation rates vary drastically. |
| 3.6 Integrative Frameworks | Unifying Theories | Higher-order structures that connect disparate laws or mechanisms under a coherent whole. | Viewing genomes as evolving mosaics shaped by mutation, selection, drift, recombination, and structural change; phylogenomics unifies sequence evolution with species relationships; genome-structure models integrate gene-content change with rearrangements; comparative genomics ties together conserved function and evolutionary divergence. |
| | Interdisciplinary Links | Points where the theory connects to adjacent sciences or larger explanatory systems. | Connects to molecular evolution (substitution theory), evolutionary biology (speciation, adaptation), structural biology (domain conservation), ecology (population structure), systems biology (network evolution), and paleogenomics (ancient-DNA reconstruction). |
| 4. Method Layer | 4.1 Inquiry Design | Experimental Design | Structured plans for manipulating variables to test causal claims. | Manipulating evolutionary conditions in experimental populations, inducing mutation-rate changes, creating controlled recombination environments, engineering gene duplications or deletions, and applying artificial selection to test hypotheses about genomic change and divergence. |
| | Observational Design | Systematic approaches for gathering non-manipulated data (surveys, field studies, natural experiments). | Comparing genomes across species, observing naturally occurring mutations and substitutions, documenting structural variants, monitoring TE proliferation, and analyzing lineage-specific genomic patterns without experimental manipulation. |
| 4.2 Testing & Validation | Hypothesis Testing | Procedures for evaluating whether evidence supports or contradicts specific claims. | Testing molecular-clock assumptions, evaluating orthology/paralogy predictions, assessing substitution-model fit, testing selection vs neutrality using dN/dS, validating synteny-based evolutionary inferences, and comparing predicted vs observed genome structural changes. |
| | Replication | The requirement that results be independently reproducible under similar conditions. | Re-sequencing samples, re-running genome assemblies, replicating alignments under different parameters, verifying phylogenetic trees with independent datasets, and validating homology assignments across multiple tools. |
| 4.3 Inference & Evaluation | Statistical Inference | Rules for drawing conclusions from noisy or incomplete data. | Estimating substitution rates, inferring ancestral sequences, fitting phylogenetic models, estimating gene-family birth–death parameters, quantifying synteny conservation, calculating divergence times, and assessing uncertainty via bootstrapping/Bayesian posteriors. |
| | Model Comparison | Criteria (fit, simplicity, predictive accuracy, robustness) used to evaluate competing models. | Comparing substitution models (JC, K2P, HKY, GTR), evaluating clock vs relaxed-clock models, comparing different phylogenetic topologies, selecting among gene-family evolution models, and evaluating genome rearrangement models for fit and parsimony. |
| 4.4 Error Management | Error Analysis | Identification and quantification of random and systematic errors. | Identifying sequencing and assembly errors, misalignments, homology misclassification, low-coverage artifacts, model misfit, long-branch attraction, undetected paralogy, and noise introduced by repetitive or structurally complex regions. |
| | Bias Control | Methods for minimizing subjective, instrumental, or procedural biases. | Using high-quality sequencing, verifying assemblies with orthogonal data, filtering low-confidence regions, controlling for GC or compositional bias, accounting for model misspecification, and validating orthology assignments through multiple criteria. |
| 4.5 Adjudication & Revision | Peer Scrutiny | Collective evaluation of claims through critique, review, and debate. | Reviewing phylogenetic trees, checking model assumptions, validating homology calls, reevaluating synteny maps, cross-comparing divergence estimates, and reconciling inconsistent results across analytical pipelines. |
| | Theory Revision | Procedures for modifying, replacing, or discarding models based on new evidence. | Updating substitution models when new mutation spectra are discovered, revising phylogenies with better sampling, correcting homology relationships with improved annotation, and adjusting genome-evolution models when structural complexity exceeds earlier assumptions. |
| 4.6 Integrity Conditions | Transparency | Requirements to disclose methods, data, assumptions, and limitations. | Disclosing assembly parameters, alignment settings, model assumptions, sequencing depth, filtering thresholds, calibration points, and all analytical pipelines; reporting uncertainty and potential artifacts. |
| | Ethical Standards | Norms ensuring responsible conduct in experimentation, data handling, and publication. | Ensuring responsible handling of genomic data, respecting access constraints (especially for human or endangered-species genomes), accurately reporting analyses, avoiding manipulation of phylogenetic or comparative results, and following ethical standards in data sharing. |