Santiago López Begines, PhD

Biomedical Data Science Consulting

I help neurotech and biomedical teams build reliable, reproducible analysis pipelines.

15+ years experimental neuroscience  ·  Production-grade R & Python

About

Neuroscientist and Data Scientist with 15+ years in biomedical research and 4–5 years specializing in machine learning, multi-omics analysis, and production data pipelines.

First author in Science Advances (2025); co-authored 8+ publications in EMBO Journal, eLife, and Cell Death & Disease.

Unique hybrid profile: deep experimental background in whole-cell patch-clamp electrophysiology, proteomics, and transgenic mouse models — combined with production-level data science in Python and R.

Key Results

  • 90% reduction in electrophysiology processing time via automated pipeline (IBiS, Seville)
  • 70% reduction in omics data cleaning time via automated R scripts (LCSB, University of Luxembourg)
  • ML-based biomarker discovery in neurodegeneration (Batten disease / CLN4)
  • → View full publication list

Proficient in Python (scikit-learn, XGBoost, LightGBM, TensorFlow) and R (tidyverse, tidymodels, Seurat, Shiny) for automated pipelines, statistical modeling, and data visualization on large-scale proteomics, transcriptomics, and electrophysiological datasets.

Services

Electrophysiology & Signal Analysis Pipelines

For neurotech teams working with patch-clamp, EEG, or multi-electrode array data who need to move from raw recordings to reproducible, documented outputs.

    → See our automated electrophysiology pipeline

  • Automated/Semi-automated data processing pipelines (HEKA / pCLAMP / Python / R)
  • Quality control and artifact rejection frameworks
  • Statistical analysis: FI curves, input resistance, E/I balance, resting membrane potential
  • Full documentation and reproducibility standards

Deliverable: functional pipeline + technical documentation

Biomedical Data Pipeline Design & Audit

For MedTech and Pharma SMEs that need to validate or audit existing analysis pipelines for regulatory submissions or internal quality standards.

  • Pipeline code review and statistical validation
  • Reproducibility assessment and gap analysis
  • SOP documentation for reproducible analytical workflows
  • Recommendations report with prioritized action items

Deliverable: written audit report + remediation roadmap

Omics Data Analysis

For biomedical research teams and CROs working with proteomics, transcriptomics, or multi-omics datasets.

    → Proteomics pipeline | → scRNA-seq pipeline

  • Label-free proteomics: MaxQuant → differential expression → biological interpretation
  • scRNA-seq analysis (Seurat): clustering, cell type annotation, trajectory analysis
  • ML-based biomarker discovery: feature selection, cross-validation, model validation

Deliverable: analysis report + reproducible R/Python scripts

Projects

Automated Proteomics Pipeline for Neurodegeneration Biomarker Discovery

Built an automated R pipeline for MaxQuant LFQ output covering data cleaning, normalization, differential expression, and ML-based biomarker discovery. 70% reduction in data cleaning time — deployed at LCSB, University of Luxembourg.

Proteomics R MaxQuant Biomarker Discovery Neurodegeneration
View Project
Work in Progress

Automated Electrophysiology Analysis Pipeline for Synaptic Data

Developing an automated pipeline for whole-cell patch-clamp data (mIPSCs, mEPSCs, FI curves) integrating miniML deep learning event detection. R modules complete — Python/miniML integration in development.

Electrophysiology Python R Patch-Clamp Neurotech
View Project

snRNA-seq Analysis Pipeline

Modular R pipeline for single-nucleus RNA-seq: from 10X CellRanger output to SCTransform normalisation, Louvain clustering, differential expression, and multi-layered functional enrichment (GO, KEGG, STRING, PANTHER).

scRNA-seq Seurat R Bioinformatics
View Project

ML Pipeline for Financial Time-Series: Rigorous Validation Framework

A case study in avoiding false discovery in predictive modeling. Only 27% of models showed genuine predictive value across 336 model/horizon combinations (McNemar, Diebold-Mariano, bootstrap CIs). Documented negative result — reproducible pipeline.

Machine Learning Time Series Python R Validation
View Project
View All on GitHub

Other Projects

Legacy System Modernization: Remote Monitoring for a 1972 Diesel Engine

Custom LoRa wireless system enabling remote start/stop and real-time telemetry (voltage, oil pressure, RPM, fuel) of a 50-year-old irrigation engine from 2+ km away. 86% reduction in site visits — 99.2% uptime over 6 months. Total cost: <€150.

Embedded Systems IoT LoRa PCB Design C/C++
View Project

Let's work together

Available for consulting projects in neurotech, MedTech, and biomedical data science.