goatpy.bin

Data-driven m/z binning for MALDI imzML files.

Instead of loading a pre-defined peak list, this module reads every spectrum in the imzML file, discovers all m/z values that appear across the dataset, and bins them into a uniform grid.

Key concepts

  • bin width (tolerance): all m/z values within tolerance Da of a bin centre are summed into that bin. A common starting point is 0.05–0.1 Da for unit-resolution MALDI instruments; use 0.005–0.02 for high-resolution.

  • bin centres: derived from the observed m/z range across all spectra so no prior knowledge is required.

  • reduce function: defaults to max (intensity of the tallest peak within the bin window), matching pyimzml’s getionimage convention. sum is also supported and can improve SNR on dense spectra.

Usage

>>> import goatpy as gp
>>> sdata = gp.bin_and_load(
...     imzml_path = "sample.imzML",
...     he_path    = "sample.svs",    # optional H&E
...     tolerance  = 0.05,
...     mz_range   = (900, 3600),     # optional subset
... )

Or just build the binned matrix without H&E registration:

>>> from goatpy.binning import bin_imzml
>>> sdata = bin_imzml("sample.imzML", tolerance=0.05)

Functions

bin_imzml(→ spatialdata.SpatialData)

Load a MALDI imzML file and bin all spectra onto a uniform m/z grid.

bin_and_align(→ spatialdata.SpatialData)

Bin all spectra from an imzML file and register against an H&E image.

Module Contents

goatpy.bin.bin_imzml(imzml_path: str, tolerance: float = 0.05, mz_range: Tuple[float, float] | None = None, reduce: Literal['max', 'sum'] = 'max', min_frequency: float = 0.0, min_intensity: float = 0.0, sample_fraction: float = 1.0, chunk_size: int = 500) spatialdata.SpatialData[source]

Load a MALDI imzML file and bin all spectra onto a uniform m/z grid.

This replaces the manual peak-list workflow (glyco_spatialdata / the bundled PEAKS.csv) with a data-driven approach that retains every detectable signal. The result is a SpatialData object compatible with all other goatpy functions.

Parameters:
  • imzml_path (str) – Path to the .imzML file.

  • tolerance (float, default 0.05) –

    Half-width of each m/z bin in Da. All peaks within [centre - tolerance, centre + tolerance] are collapsed to one bin.

    Recommended values: - Low-resolution (unit-res) MALDI: 0.1 – 0.5 Da - Medium-resolution: 0.02 – 0.1 Da - High-resolution (Orbitrap/FT): 0.002 – 0.01 Da

  • mz_range ((lo, hi) or None) – Restrict the output to this m/z window, e.g. (900.0, 3600.0). None uses the full range found in the file.

  • reduce ("max" | "sum", default "max") –

    How to combine multiple peaks within one bin.

    • "max" — tallest peak wins (matches pyimzml getionimage)

    • "sum" — integrate all peaks (better SNR for dense spectra)

  • min_frequency (float, default 0.0) – After binning, drop m/z bins detected in fewer than this fraction of pixels (0.0 = keep all). E.g. 0.01 drops bins present in < 1 % of pixels.

  • min_intensity (float, default 0.0) – Drop m/z bins whose maximum intensity across all pixels is below this value.

  • sample_fraction (float, default 1.0) – Fraction of spectra to scan when discovering the m/z range. Values < 1 speed up the range-discovery step on large files but may miss rare m/z values.

  • chunk_size (int, default 500) – Progress is logged every chunk_size spectra.

Returns:

shapes[“pixels”] — one square per MALDI pixel points[“centroids”] — centroid of each pixel tables[“maldi_adata”] — AnnData, rows = pixels, columns = m/z bins

Return type:

SpatialData with

Examples

>>> from goatpy.binning import bin_imzml
>>> sdata = bin_imzml("sample.imzML", tolerance=0.05, mz_range=(900, 3600))
>>> sdata["maldi_adata"].shape
(4200, 13500)   # depends on your data

# Then normalise, reduce, cluster as usual: >>> import goatpy as gp >>> sdata = gp.normalize_spatialdata(sdata, table_name=”maldi_adata”) >>> sdata = gp.graphpca_spatialdata(sdata, n_components=30) >>> sdata = gp.get_kmean_clusters(sdata, n_clusters=8)

# Or pass directly to load_and_align for H&E registration — just supply # the pre-binned sdata instead of using the imzml_path loading path. # (H&E registration still runs on the TIC image built from the binned matrix.)

goatpy.bin.bin_and_align(imzml_path: str, he_path: str, tolerance: float = 0.05, mz_range: Tuple[float, float] | None = None, reduce: Literal['max', 'sum'] = 'max', min_frequency: float = 0.0, min_intensity: float = 0.0, geojson_path: str | None = None, maldi_pixel_um: float | None = None, he_pixel_um: float | None = None, img_upscaling: int = 10, buffer_px: int = 150, coarse_rotation_step: int = 15, fine_rotation_range: float = 5.0, fine_rotation_step: float = 1.0, **kwargs) spatialdata.SpatialData[source]

Bin all spectra from an imzML file and register against an H&E image.

This is the data-driven equivalent of load_and_align: instead of loading a fixed peak list it bins the entire spectral space first, then passes the binned TIC image to the registration engine.

Parameters:
  • imzml_path (str) – Paths to imzML and H&E files.

  • he_path (str) – Paths to imzML and H&E files.

  • tolerance (float) – Bin half-width in Da — see bin_imzml for guidance.

  • mz_range ((lo, hi) or None) – Restrict to this m/z window before registration.

  • reduce ("max" | "sum") – Bin reduction function.

  • min_frequency (float) – Post-binning quality filters — see bin_imzml.

  • min_intensity (float) – Post-binning quality filters — see bin_imzml.

  • geojson_path (str or None) – Optional QuPath annotation export.

  • maldi_pixel_um (float or None) – Physical pixel sizes. Auto-detected when None.

  • he_pixel_um (float or None) – Physical pixel sizes. Auto-detected when None.

  • img_upscaling (int) – Upscaling factor for the output canvas.

  • buffer_px (int) – Canvas padding at registration resolution.

  • coarse_rotation_step (float) – Registration search parameters.

  • fine_rotation_range (float) – Registration search parameters.

  • fine_rotation_step (float) – Registration search parameters.

Returns:

Same structure as load_and_align output.

Return type:

SpatialData