How to Use Profile Comparison — Classification (step-by-step)

Prerequisites

  • Make sure the comparison contains the samples and tracks you want to analyze.

  • Clean profiles as needed (bounds, smoothing, baseline correction, normalization) before modeling.

  • For supervised workflows, prepare representative labeled tracks for each class.

Layout overview

../_images/clusters_layout.svg

Step 1 — Choose classification mode (Modes)

Goal: pick the overall approach that fits your objective.

Actions:

  1. Open the Classification mode selector.

  2. Choose one of:

  • Unsupervised — discover structure and clusters without labels.

  • Supervised One-Class — model a single reference class to detect deviations.

  • Supervised Multi-Class — train models to discriminate multiple labeled classes.

Tips:

  • If you have labeled, representative examples for each desired class, prefer Multi-Class.

  • If you only have a reference ‘good’ condition, use One-Class for anomaly detection.

Visual:

../_images/modes.png

Step 2 — Preprocessing (Wavelengths, Data Mode, Profile Tools)

Goal: make data comparable and reduce noise so your model learns signal, not artifacts.

Ordered actions (follow this order):

  1. Wavelength / illumination selection

  • Choose one or several wavelengths/illuminations with usable signal across samples.

  • If some samples lack a selection, either pick another wavelength or exclude those samples.

  1. Data Mode: Profiles vs Peaks

  • Profiles: use full profile shapes (recommended when shapes are informative and consistent).

  • Peaks: use peak descriptors (max Rf, height, FWHM) when peaks encode the information or profiles are noisy.

  1. Profile Tools (apply as needed and in roughly this order):

  • Bounds: trim out irrelevant regions (front, pre-sample areas).

  • Baseline correction: remove background offsets.

  • Smoothing: reduce high-frequency noise.

  • Normalization: scale amplitudes to make samples comparable.

  1. Binning (optional, Profiles only)

  • Use binning to reduce dimensionality and smooth noise. Pick a bin size that preserves meaningful features.

Visual / example:

../_images/preprocessing.png

Step 3 — Dimension reduction (unsupervised classification)

Goal: reduce high-dimensional profile/peak vectors to low-dim features for clustering and visualization.

Recommended sequence:

  1. Start with PCA for a fast baseline.

  2. If separation is poor, try UMAP or t-SNE and tune parameters (perplexity, n_neighbors, min_dist).

  3. Inspect the 3D view to confirm clusters are visually separable before clustering or supervised training.

Notes:

  • For unsupervised workflows, clustering runs on the reduced features — good DR is essential.

  • For supervised workflows, DR is primarily a visualization aid; training uses the raw/processed features.

Step 4 — Modeling

Unsupervised (clustering):

  • Choose an algorithm: K-means for compact clusters, DBSCAN for density-based discovery, Hierarchical for nested groups.

  • Run the clustering algorithm on the reduced features and inspect assignments in the 3D view.

Supervised:

  1. Edit Classes

  • Open Edit Classes and assign representative tracks to each class (drag/drop or selection controls).

  1. Train

  • For One-Class: provide representative samples and train the model; review jackknife/QC metrics.

  • For Multi-Class: train and review confusion/probability outputs and class separation metrics.

  1. Validate

  • Use cross-validation where available and inspect diagnostic charts (confusion matrices, Hodges–Lehmann).

Training tips:

  • Retrain after any change to wavelengths, preprocessing, or class membership.

  • Add more labeled examples when classes are heterogeneous or model performance is poor.

Step 5 — Results and interpretation

Unsupervised:

  • Each track receives a cluster assignment; cluster numbering is arbitrary — compare clusters to references to interpret meaning.

Supervised One-Class:

  • Each track is scored against the reference class; set thresholds to flag deviations.

Supervised Multi-Class:

  • Per-track class probabilities are provided; inspect low-confidence tracks and confusion patterns.

Common actions:

  • Visualize clusters/classes in the 3D view with coloring and markers.

  • Export profiles or peak descriptors for offline analysis or reporting.

Result visual:

../_images/unsupervised.png

Troubleshooting quick list

  • Nothing appears in 3D (Peaks mode): some samples have no detected peaks — switch to Profiles mode.

  • All samples appear as a single blob: re-evaluate preprocessing, try alternate DR algorithms or tune DR parameters.

  • Supervised models perform poorly: improve preprocessing, increase labeled training samples, or simplify classes.

Quick checklist before concluding

  • Did you pick the correct mode (unsupervised/supervised)?

  • Did you select appropriate wavelengths and data mode?

  • Did you apply preprocessing in the recommended order?

  • Did you inspect DR results before clustering/training?

  • Did you validate model performance using available QC metrics?

If you want the full sub-pages (Preprocessing, Modeling, Results) copied into this VisionCats section so it is fully self-contained, say “copy sub-pages” and I will duplicate those files, adjust image references, and run the build.