My research sits at the intersection of probabilistic machine learning, interpretable computer vision, computational astrophysics, and medical informatics. The unifying thread is the development of robust, interpretable methodological tools for identifying, extracting, and characterising structure within complex observational or simulated datasets — combining rigorous mathematical formalisms with domain-informed physical interpretation.


Research Themes

Probabilistic Manifold Learning

Development of probabilistic frameworks for recovering and analysing manifolds embedded in noisy, multi-structured datasets. This includes the 1-DREAM method, a general framework for multi-manifold modelling with heterogeneous noise, and probabilistic Hough Transform approaches for detecting filamentary structures in high-noise regimes.

Interpretable Computer Vision

Design of vision architectures that are exactly faithful by construction — where the explanation is the computation, not an approximation of it. This includes prototype-based, part-aware architectures with sparse linear heads, competitive slot attention mechanisms, and learnable DAG hierarchies, evaluated across classification, detection, and part segmentation tasks.

Astrophysics & Cosmic Structure

Application of ML methods to map the Large-Scale Structure of the Universe, detect and characterise stellar streams (e.g. Jhelum, NGC 1261/1904 tidal tails), identify cosmic filaments around galaxy clusters, and study the Fornax-Eridanus supercluster complex. Currently conducting a systematic census of tidal structures around globular clusters using Gaia astrometric data.

Discriminative Subspace Learning

Development of probabilistic approaches for feature relevance estimation and dimensionality reduction informed by measurement error. Includes Probabilistic GMLVQ, discriminative subspace emersion methods, and graph attention networks for learning interpretable latent representations across populations.

Medical Informatics

Building computational tools to study multi-morbidity and disease progression in large patient populations. This includes the InciGraph tool for exploring intersectional demographic variation in disease trajectories, drug prescription sequence clustering for diabetes patients, and metabolomic analysis for Cushing’s disease diagnosis.

Materials Science & Industrial AI

Development of surrogate models for simulating the directional solidification process of metallic superalloys, with application to aerospace turbine component manufacturing (current UKRI project). Computer Vision applied to industrial images to estimate latent subspaces distinguishing texture, morphology, and structural defects.


Selected Projects

HPSNet: Hierarchical Prototypical Slot Network In preparation

An interpretable vision architecture designed to maintain exact linear faithfulness across classification, detection, and part segmentation. HPSNet integrates five core components: competitive multi-object slot attention (MOSA), a learnable directed acyclic graph (DAG) hierarchy, prototype-based feature matching, and a sparse linear classification head guaranteeing zc = ∑wmam + bc exactly. Evaluated across three tasks of increasing spatial complexity on CUB-200, PASCAL VOC 2012, and PartImageNet, the central finding is that competitive slot attention transitions from redundant to essential as the task requires spatial decomposition.

ConstellationNet: Probabilistic Spatial Part Configurations In preparation

An interpretable fine-grained visual classifier built on three components: a slot-attention foundation backbone (DINOv2 ViT-S/14) producing geometric part descriptors (centred position, log-scale, double-angle orientation); a class-agnostic vocabulary of factorised Student-t ambient expert potentials; and an exactly faithful linear head with non-negative constellation weights. Achieves 85.04% top-1 / 97.34% top-5 on CUB-200-2011 with exact faithfulness preserved throughout. Demonstrates dramatic family-level pathway specialisation (up to +86.7pp effect sizes), showing that interpretability and high performance are not in tension.

1-DREAM: 1D Recovery, Extraction and Analysis of Manifolds

A public toolbox for detecting one-dimensional manifold structures in noisy data — filaments, stellar streams, cosmic web. The method recovers an explicit parametric formulation as a mapping from a latent abstract graph, enabling quantitative study of physical properties along the manifold. Applied to Jellyfish galaxy simulations, the Jhelum stream, and Large-Scale Structure mapping.

→ GitHub Repository

InciGraph: Intersectional Disease Progression Analysis

An interactive tool for exploring intersectional demographic variation in temporal, population-wide disease progression patterns. Developed within the OPTIMAL project (NIHR) to support epidemiological analysis of multi-morbidity across diverse patient populations.

→ GitHub Repository

Discriminative Subspace Emersion

A probabilistic method for learning feature relevances across different populations, enabling robust separation of structured signal from non-significant variation in high-dimensional datasets.

→ GitHub RepositoryarXiv preprint

Globular Cluster Tidal Census Ongoing

A systematic study of tidal structures around the full known globular cluster population using Gaia astrometric data, combining probabilistic Hough Transform and ant-colony-inspired algorithms with colour-magnitude diagram compatibility for probabilistic membership estimation.


International Collaborations

  • Millennium Nucleus for Galaxies (MINGAL), Valparaíso, Chile
  • Kapteyn Astronomical Institute, Groningen, Netherlands
  • Instituto de Astrofísica de Canarias (IAC), Spain
  • University of Ghent, Belgium
  • EDUCADO Innovative Training Network (ITN)
  • SUNDIAL MSCA-ITN-ETN Network

Future Directions

  • Scalable probabilistic frameworks for detecting faint filaments in deep surveys (LSST)
  • Automatic classification of the dynamical state of galaxy clusters
  • Interpretable tools for uncertainty quantification in identified structures
  • Integration of geometric modelling with multi-band photometric information
  • Homogenisation of stellar photosphere parameter spaces across multiple surveys via probabilistic latent representations