goatpy.bin
Data-driven m/z binning for MALDI imzML files.
Instead of loading a pre-defined peak list, this module reads every spectrum in the imzML file, discovers all m/z values that appear across the dataset, and bins them into a uniform grid.
Key concepts
bin width (tolerance): all m/z values within tolerance Da of a bin centre are summed into that bin. A common starting point is 0.05–0.1 Da for unit-resolution MALDI instruments; use 0.005–0.02 for high-resolution.
bin centres: derived from the observed m/z range across all spectra so no prior knowledge is required.
reduce function: defaults to
max(intensity of the tallest peak within the bin window), matching pyimzml’sgetionimageconvention.sumis also supported and can improve SNR on dense spectra.
Usage
>>> import goatpy as gp
>>> sdata = gp.bin_and_load(
... imzml_path = "sample.imzML",
... he_path = "sample.svs", # optional H&E
... tolerance = 0.05,
... mz_range = (900, 3600), # optional subset
... )
Or just build the binned matrix without H&E registration:
>>> from goatpy.binning import bin_imzml
>>> sdata = bin_imzml("sample.imzML", tolerance=0.05)
Functions
|
Load a MALDI imzML file and bin all spectra onto a uniform m/z grid. |
|
Bin all spectra from an imzML file and register against an H&E image. |
Module Contents
- goatpy.bin.bin_imzml(imzml_path: str, tolerance: float = 0.05, mz_range: Tuple[float, float] | None = None, reduce: Literal['max', 'sum'] = 'max', min_frequency: float = 0.0, min_intensity: float = 0.0, sample_fraction: float = 1.0, chunk_size: int = 500) spatialdata.SpatialData[source]
Load a MALDI imzML file and bin all spectra onto a uniform m/z grid.
This replaces the manual peak-list workflow (
glyco_spatialdata/ the bundled PEAKS.csv) with a data-driven approach that retains every detectable signal. The result is a SpatialData object compatible with all other goatpy functions.- Parameters:
imzml_path (str) – Path to the .imzML file.
tolerance (float, default 0.05) –
Half-width of each m/z bin in Da. All peaks within
[centre - tolerance, centre + tolerance]are collapsed to one bin.Recommended values: - Low-resolution (unit-res) MALDI: 0.1 – 0.5 Da - Medium-resolution: 0.02 – 0.1 Da - High-resolution (Orbitrap/FT): 0.002 – 0.01 Da
mz_range ((lo, hi) or None) – Restrict the output to this m/z window, e.g.
(900.0, 3600.0).Noneuses the full range found in the file.reduce ("max" | "sum", default "max") –
How to combine multiple peaks within one bin.
"max"— tallest peak wins (matches pyimzmlgetionimage)"sum"— integrate all peaks (better SNR for dense spectra)
min_frequency (float, default 0.0) – After binning, drop m/z bins detected in fewer than this fraction of pixels (0.0 = keep all). E.g.
0.01drops bins present in < 1 % of pixels.min_intensity (float, default 0.0) – Drop m/z bins whose maximum intensity across all pixels is below this value.
sample_fraction (float, default 1.0) – Fraction of spectra to scan when discovering the m/z range. Values < 1 speed up the range-discovery step on large files but may miss rare m/z values.
chunk_size (int, default 500) – Progress is logged every
chunk_sizespectra.
- Returns:
shapes[“pixels”] — one square per MALDI pixel points[“centroids”] — centroid of each pixel tables[“maldi_adata”] — AnnData, rows = pixels, columns = m/z bins
- Return type:
SpatialData with
Examples
>>> from goatpy.binning import bin_imzml >>> sdata = bin_imzml("sample.imzML", tolerance=0.05, mz_range=(900, 3600)) >>> sdata["maldi_adata"].shape (4200, 13500) # depends on your data
# Then normalise, reduce, cluster as usual: >>> import goatpy as gp >>> sdata = gp.normalize_spatialdata(sdata, table_name=”maldi_adata”) >>> sdata = gp.graphpca_spatialdata(sdata, n_components=30) >>> sdata = gp.get_kmean_clusters(sdata, n_clusters=8)
# Or pass directly to load_and_align for H&E registration — just supply # the pre-binned sdata instead of using the imzml_path loading path. # (H&E registration still runs on the TIC image built from the binned matrix.)
- goatpy.bin.bin_and_align(imzml_path: str, he_path: str, tolerance: float = 0.05, mz_range: Tuple[float, float] | None = None, reduce: Literal['max', 'sum'] = 'max', min_frequency: float = 0.0, min_intensity: float = 0.0, geojson_path: str | None = None, maldi_pixel_um: float | None = None, he_pixel_um: float | None = None, img_upscaling: int = 10, buffer_px: int = 150, coarse_rotation_step: int = 15, fine_rotation_range: float = 5.0, fine_rotation_step: float = 1.0, **kwargs) spatialdata.SpatialData[source]
Bin all spectra from an imzML file and register against an H&E image.
This is the data-driven equivalent of
load_and_align: instead of loading a fixed peak list it bins the entire spectral space first, then passes the binned TIC image to the registration engine.- Parameters:
imzml_path (str) – Paths to imzML and H&E files.
he_path (str) – Paths to imzML and H&E files.
tolerance (float) – Bin half-width in Da — see
bin_imzmlfor guidance.mz_range ((lo, hi) or None) – Restrict to this m/z window before registration.
reduce ("max" | "sum") – Bin reduction function.
min_frequency (float) – Post-binning quality filters — see
bin_imzml.min_intensity (float) – Post-binning quality filters — see
bin_imzml.geojson_path (str or None) – Optional QuPath annotation export.
maldi_pixel_um (float or None) – Physical pixel sizes. Auto-detected when None.
he_pixel_um (float or None) – Physical pixel sizes. Auto-detected when None.
img_upscaling (int) – Upscaling factor for the output canvas.
buffer_px (int) – Canvas padding at registration resolution.
coarse_rotation_step (float) – Registration search parameters.
fine_rotation_range (float) – Registration search parameters.
fine_rotation_step (float) – Registration search parameters.
- Returns:
Same structure as
load_and_alignoutput.- Return type:
SpatialData