CellScope.cs.GraphCluster

CellScope.cs. GraphCluster (fea_selected : np.ndarray, metric : str = None, num_cell_thre : int = 100000, index : list = [])

GraphCluster constructs a cell similarity graph using UMAP’s fuzzy simplicial set [MMH18] and performs hierarchical clustering to group cells, returning clustering results at multiple levels of granularity. The function is adaptable to large-scale datasets by sampling a subset of cells to enhance computational efficiency.

Parameters

  • fea_selected (np.ndarray):

    A 2D array representing the selected features for clustering, where rows correspond to cells and columns to genes.

  • metric (str, optional, default=None):

    The distance metric used for clustering. If not specified, it is automatically determined based on the dataset size: Euclidean is selected when the dataset has fewer than 10,000 cells, and Jaccard is used otherwise.

  • num_cell_thre (int, optional, default=100000):

    The threshold for the number of cells. If the dataset exceeds this threshold, a subset of cells is selected for clustering.

  • index (list, optional, default=[]):

    An optional list of indices for selecting specific cells. If left empty and the number of cells exceeds num_cell_thre, a random subset of cells will be selected.

  • random_seed (int, optional, default=83):

    The random seed for ensuring reproducibility when constructing the graph and selecting the subset of indices.

Return

  • T_all (np.ndarray):

    A matrix of shape (num_cell, 49), where each column represents the clustering results from different steps of the agglomerative clustering process.