CellScope.cs.GraphCluster
CellScope.cs. GraphCluster (fea_selected : np.ndarray, metric : str = None,
num_cell_thre : int = 100000, index : list = [])
GraphCluster constructs a cell similarity graph using UMAP’s fuzzy simplicial set [MMH18] and performs hierarchical clustering to group cells, returning clustering results at multiple levels of granularity. The function is adaptable to large-scale datasets by sampling a subset of cells to enhance computational efficiency.
Parameters
fea_selected (
np.ndarray):A 2D array representing the selected features for clustering, where rows correspond to cells and columns to genes.
metric (
str, optional, default=None):The distance metric used for clustering. If not specified, it is automatically determined based on the dataset size: Euclidean is selected when the dataset has fewer than 10,000 cells, and Jaccard is used otherwise.
num_cell_thre (
int, optional, default=100000):The threshold for the number of cells. If the dataset exceeds this threshold, a subset of cells is selected for clustering.
index (
list, optional, default=[]):An optional list of indices for selecting specific cells. If left empty and the number of cells exceeds num_cell_thre, a random subset of cells will be selected.
random_seed (
int, optional, default=83):The random seed for ensuring reproducibility when constructing the graph and selecting the subset of indices.
Return
T_all (
np.ndarray):A matrix of shape (num_cell, 49), where each column represents the clustering results from different steps of the agglomerative clustering process.