Principle ============ ScTriangulate Object --------------------- We created a new compound data structure called ScTriangulate Object, abbreviated as ``sctri``. ``sctri`` is designed as an expanded version of popular ``adata`` data structure, with the additional "cluster goodness" information added on top of that. In short, ``sctri`` stored your data and how good your cluster labels/annotations are, measured by an array of biologically meaningful metrics. .. image:: ./_static/sctri_chop.png :height: 400px :width: 600px :align: center :target: target Let's digest them one by one: ``sctri.dir``: python string, please specify the path to the folder where all the outputs will fall into. ``sctri.adata``: AnnData Object, stored the expression values and cell/feature metadata. ``sctri.query``: python list, each item in the list corresponds to a column name in ``sctri.adata.obs``. Recalling the schema of scTriangualte, please tell the program what sets of annotations you want to consider. For instance, I want to compare following four annotations:: sctri.query = ['leiden_resolution1','leiden_resolution2','azimuth_mapping_reference1','adt_annotations'] ``sctri.score``: Nested python dictionary (for internal use, users do not need to specify), storing the goodness (measured by each metric) of each cluster in each annotation. The metric corresponds to ``self.metrics`` (see below), the annontation corresponds to ``self.query``. If you want to access:: sctri.score['annotation1']['cluster_to_reassign']['cluster1'] # result will be 0.45 .. image:: ./_static/sctri_score_chop.png :height: 300px :width: 600px :align: center :target: target ``sctri.cluster``: Nested python dictionary (for internal use, users do not need to specify), storing the hierarchy of each annotation and cluster name. Likewise, if you want to access the information:: sctri.cluster['annotation1'][0] # result will be 'cluster1' .. image:: ./_static/sctri_cluster_chop.png :height: 300px :width: 600px :align: center :target: target ``sctri.invalid``: Python list, this contains the cluster names (i.e. annotation1@cluster3) that the program labelled as "not stable". By default, scTriangulate filters the ``raw`` cluster by ``win_fraction``, meaning how many fraction of cells in the original cluster are retained after the "game". Those invalid cluster will be excluded from final ``pruned`` result, and cells within those invalid clusters will be reassigned to the nearest neighbors. Users can also append cluster name to the list, by:: sctri.invalid.append('annotation1@cluster3') # then cells within this cluster will be reassigned in the pruning step. ``sctri.metrics``: Python list, this contains the metrics we want to use to assess how good a cluster is. By default, the value of this list is:: sctri.metrics = ['reassign','tfidf10','SCCAF','doublet'] which means each cluster will be scanned using these four scores, each score correponds to an underlying function that the program has already implemented, you can add additional metrics by:: sctri.add_new_metrics({'my_metric':callable}) And now the cluster will also be scanned using user defined metric function. ``sctri.species``: Python string, now only support either "human" or "mouse". It only affects how the program retrieve the "artifact" genes names in a small internal database (txt file), including ribosomal genes, mitochondrial genes, etc. ``sctri.criterion``: Python int, it specify how the program labels "artifact genes", we assume cellcycle gene, ribosome gene, mitochrondrial gene, antisense, and predict_gene may not be what the users want, but it varies by the need. So the user can choose from these 6 modes. Genes being labelled as "artifact" will no longer be considered in the marker genes and downstream assessment:: criterion1: all will be artifact. criterion2: all will be artifact except cellcycle (default). criterion3: all will be artifact except cellcycle, ribosome. criterion4: all will be artifact except cellcycle, ribosome, mitochondrial. criterion5: all will be artifact except cellcycle, ribosome, mitochondrial, antisense. criterion6: all will be artifact except cellcycle, ribosome, mitochondrial, antisense, predict_gene. ``self.verbose``: Python int. 1 means output the log to the stdout, 2 means write to a log file. Default is 1. ``self.uns``: Python dictionary, which inspired by scanpy. Here we by default store some very important information including markers genes. To access:: sctri.uns['marker_genes']['anntation1'].loc[:['cluster1','cluster2']] .. image:: ./_static/sctri_uns_chop.png :height: 250px :width: 600px :align: center :target: target .. _reference_to_visualization: Visualization ---------------- scTriangulate offers a powerful toolkit allowing end users to visualize the hidden heterogeneity in many different ways, also the ``color`` Module provide necessary function to assist in making publication quality figures. Here we highlight some of the plotting function and we would like to refer the users to the ``API`` part for more details. plot_heterogeneity ~~~~~~~~~~~~~~~~~~~~~ This is the main feature of scTriangulate visualizations, built on top of scanpy. Since scTriangualte can mix-and-match cluster boundaries from diverse annotations, it empowers the users to discover further and hidden heterogeneity. Now, question is how the user can visualize the heterogeneity? .. image:: ./_static/plot_heterogeneity_chop.png :height: 200px :width: 600px :align: center :target: target The philosophy behind this function is to first pick a viewpoint from which we want to look at the final result. For instance, here we choose "annotation1" as the viewpoint. As you can see, **annoatation@c1** has been suggested to be divided by two sub populations, now we want to know: 1. how these two sub populations are lait out on umap? 2. what are the differentially expressed features between these two sub populations? Let's show some of the functionalities: **1. UMAP**:: sctri.plot_heterogeneity(key='sctri_rna_leiden_1',cluster='6',style='umap') .. image:: ./_static/ph_umap.png :height: 300px :width: 500px :align: center :target: target **2. Heatmap**:: sctri.plot_heterogeneity(key='sctri_rna_leiden_1',cluster='6',style='heatmap') .. image:: ./_static/ph_heatmap.png :height: 400px :width: 500px :align: center :target: target **3. dual_gene_plot**:: sctri.plot_heterogeneity(key='sctri_rna_leiden_1',cluster='6',style='dual_gene',genes=['TRDC','SLC4A10']) .. image:: ./_static/ph_dual_gene.png :height: 350px :width: 500px :align: center :target: target **4. coexpression_plot**:: sctri.plot_heterogeneity(key='lenden1',cluster='6',style='coexpression',kind='contourf',gene1='NKG7',gene2='CD8A') .. image:: ./_static/coexpression.png :height: 350px :width: 500px :align: center :target: target plot_two_column_sankey ~~~~~~~~~~~~~~~~~~~~~~~~ To visualize the correspondence of two annotations:: sctri.plot_two_column_sankey('leiden1','leiden2',margin=5) .. image:: ./_static/two_column_sankey.png :height: 350px :width: 500px :align: center :target: target plot_concordance ~~~~~~~~~~~~~~~~~~ When we have more than 2 annotation-sets, we want to know how they correspond to each other, what fraction of cells in annotation1 flow into another annotation and vice versus:: sctri.plot_concordance(key1='azimuth',key2='pruned',style='3dbar') .. image:: ./_static/3dbar.png :height: 400px :width: 500px :align: center :target: target plot_clusterability ~~~~~~~~~~~~~~~~~~~~~~ Do you want to know for a specific annotation-set, which cluster is most likely to be subdivided and which is the least? We refer to this as clusterability:: sctri.plot_clusterability(key='sctri_rna_leiden_1',col='raw',fontsize=8) .. image:: ./_static/plot_clusterability.png :height: 400px :width: 500px :align: center :target: target plot_long_heatmap ~~~~~~~~~~~~~~~~~~~~~~ A heatmap that can be arbitrarily long and ALWAYS display every gene:: sctri.plot_long_umap(n_features=20,figsize=(20,20)) .. image:: ./_static/long_heatmap.png :height: 400px :width: 500px :align: center :target: target plot_multi_modal_feature_rank ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In multi-modal setting, a cluster's identify usually defined by all modalities, do you want to know by which modality a cluster is mainly defined?:: sctri.plot_multi_modal_feature_rank(cluster='sctri_rna_leiden_2@10') .. image:: ./_static/plot_multi_modal_feature_rank.png :height: 500px :width: 500px :align: center :target: target plot_stability ~~~~~~~~~~~~~~~~~ Plot the stability of competing clusters:: sctri.plot_stability(clusters=['Sun@Interstitial_macrophages','Kaminsky@cDC2','Krasnow@IGSF21+_Dendritic'],broke=True,top_ylim=[5,7]) .. image:: ./_static/plot_stability.png :height: 300px :width: 400px :align: center :target: target plot_confusion ~~~~~~~~~~~~~~~~ It allows you to visualize the stability of each clustes in one annotation:: sctri.plot_confusion(name='confusion_reassign',key='sctri_rna_leiden_1') .. image:: ./_static/pc.png :height: 400px :width: 500px :align: center :target: target