magmap.stats.clustering module

Clustering measurements.

class magmap.stats.clustering.ClusterByLabel[source]

Bases: object

blobs = None
classmethod cluster_by_label(blobs, labels_img_np, blobs_lbl_scaling, blobs_iso_scaling, all_labels=False)[source]
classmethod cluster_within_label(label_id, eps, minpts, n_jobs)[source]
magmap.stats.clustering.cluster_blobs(img_path, suffix=None)[source]

Cluster blobs and save to Numpy archive.

Parameters:
  • img_path (str) – Base path from which registered labels and blobs files will be found and output blobs file save location will be constructed.

  • suffix (str) – Suffix for path; defaults to None.

Returns:

magmap.stats.clustering.cluster_dbscan_metrics(labels)[source]

Calculate basic metrics for DBSCAN.

Parameters:

labels (np.ndarray) – Cluster labels.

Returns:

Tuple of number of clusters, number of noise blobs, and number of blobs contained within the largest cluster.

Return type:

int, int, int

magmap.stats.clustering.knn_dist(blobs, n, max_dist=None, max_pts=None, show=True)[source]

Measure the k-nearest-neighbors distance.

Parameters:
  • blobs (np.ndarray) – Sequence given as [n_samples, n_features], where features typically is of the form, [z, y, x, ...].

  • n (int) – Number of neighbors. The farthest neighbor will be used for sorting, filtering, and plotting.

  • max_dist (float) – Cap the maximum distance of points to plot, given as factor of the median distance; defaults to None to show neighbors of all distances.

  • max_pts (int) – Cap the maximum number of points for the zoomed plot if the 90th percentile exceeds this number; defaults to None.

  • show (bool) – True to immediately show the plot the distances; defaults to True. Will still plot and save in the background if config.savefig is set.

Returns:

neighbors.NearestNeighbors, np.ndarray, List[pd.DataFrame]: Tuple of NearestNeighbors object, all distances from kneighbors sorted by the n``th neighbor, and a list of data frames at different zoom levels (``[df_overview, df_zoomed]).

magmap.stats.clustering.plot_knns(img_paths, suffix=None, show=False, names=None)[source]

Plot k-nearest-neighbor distances for multiple sets of blobs, overlaying on a single plot.

Parameters:
  • img_paths (List[str]) – Base paths from which registered labels and blobs files will be found and output blobs file save location will be constructed.

  • suffix (str) – Suffix for path; defaults to None.

  • show (bool) – True to plot the distances; defaults to False.

  • names (List[str]) – Sequence of names corresponding to img_paths for the plot legend.