Manage a cell type registry#
Background#
Cell types classify cells based on public and private knowledge gained from studying gene expression patterns, morphology, functional & other properties. Long established cell types have known markers and properties but cell subtypes and states are continuously being discovered, better understood and knowledge gets refined.
In this notebook, we use CellTypist, a computational tool for cell type classification in scRNA-seq data. It assigns cell types based on gene expression profiles.
First, we create a cell type registry for cell types supported by CellTypist. Then, we’ll use CellTypist to classify cell types of a previously unannotated dataset and ingest the dataset with LaminDB. Finally, we will demonstrate how to fetch datasets with cell type queries using LaminDB.
Setup#
To run this notebook, you need to load a LaminDB instance that has the bionty
schema mounted.
Here, we’ll create a test instance (skip if you’d like to run it using your instance):
!lamin init --storage ./celltypist --schema bionty
Show code cell output
💡 creating schemas: core==0.46.1 bionty==0.30.0
✅ saved: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-08-28 18:24:12)
✅ saved: Storage(id='IORfHudU', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/celltypist', type='local', updated_at=2023-08-28 18:24:12, created_by_id='DzTjkKse')
✅ loaded instance: testuser1/celltypist
💡 did not register local instance on hub (if you want, call `lamin register`)
Show code cell content
# filter warnings from celltypist
import warnings
warnings.filterwarnings("ignore", message=".*The 'nopython' keyword.*")
import lamindb as ln
import lnschema_bionty as lb
import celltypist
import pandas as pd
lb.settings.species = "human" # globally set species
✅ loaded instance: testuser1/celltypist (lamindb 0.51.0)
✅ set species: Species(id='uHJU', name='human', taxon_id=9606, scientific_name='homo_sapiens', updated_at=2023-08-28 18:24:17, bionty_source_id='RhiA', created_by_id='DzTjkKse')
ln.track()
💡 notebook imports: celltypist==1.6.0 lamindb==0.51.0 lnschema_bionty==0.30.0 pandas==2.0.3
✅ saved: Transform(id='s5mkN5NQ1ttIz8', name='Manage a cell type registry', short_name='celltypist', version='0', type=notebook, updated_at=2023-08-28 18:24:17, created_by_id='DzTjkKse')
✅ saved: Run(id='pGJ6f2eOq2WMNDPEfB5D', run_at=2023-08-28 18:24:17, transform_id='s5mkN5NQ1ttIz8', created_by_id='DzTjkKse')
For a start, let’s take a look at the public Cell Ontology.
celltype_bt = lb.CellType.bionty() # equals to bionty.CellType()
celltype_bt
CellType
Species: all
Source: cl, 2023-04-20
#terms: 2862
📖 CellType.df(): ontology reference table
🔎 CellType.lookup(): autocompletion of terms
🎯 CellType.search(): free text search of terms
✅ CellType.validate(): strictly validate values
🧐 CellType.inspect(): full inspection of values
👽 CellType.standardize(): convert to standardized names
🪜 CellType.diff(): difference between two versions
🔗 CellType.ontology: Pronto.Ontology object
Create an in-house registry of CellTypist terms based on the public Cell Ontology#
Fetch CellTypist’s immune cell encyclopedia#
As a first step we will read in CellTypist’s immune cell encyclopedia
description = "CellTypist Pan Immune Atlas v2: basic cell type information"
celltypist_source_v2_url = "https://github.com/Teichlab/celltypist_wiki/raw/main/atlases/Pan_Immune_CellTypist/v2/tables/Basic_celltype_information.xlsx"
# our source data
celltypist_file = ln.File.filter(description=description).one_or_none()
if celltypist_file is None:
celltypist_df = pd.read_excel(celltypist_source_v2_url)
celltypist_file = ln.File(celltypist_df).save()
else:
celltypist_df = celltypist_file.load().head()
💡 file will be copied to default storage upon `save()` with key `None` ('.lamindb/meXObXwsQzrXRVpHQ8bJ.parquet')
💡 data is a dataframe, consider using .from_df() to link column names as features
✅ storing file 'meXObXwsQzrXRVpHQ8bJ' at '.lamindb/meXObXwsQzrXRVpHQ8bJ.parquet'
It provides an ontology_id
of the public Cell Ontology for the majority of records.
celltypist_df.head()
High-hierarchy cell types | Low-hierarchy cell types | Description | Cell Ontology ID | Curated markers | |
---|---|---|---|---|---|
0 | B cells | B cells | B lymphocytes with diverse cell surface immuno... | CL:0000236 | CD79A, MS4A1, CD19 |
1 | B cells | Follicular B cells | resting mature B lymphocytes found in the prim... | CL:0000843 | CXCR5, TNFRSF13B, CD22 |
2 | B cells | Proliferative germinal center B cells | proliferating germinal center B cells | CL:0000844 | MKI67, SUGCT, AICDA |
3 | B cells | Germinal center B cells | proliferating mature B cells that undergo soma... | CL:0000844 | POU2AF1, CD40, SUGCT |
4 | B cells | Memory B cells | long-lived mature B lymphocytes which are form... | CL:0000787 | CR2, CD27, MS4A1 |
The “Cell Ontology ID” is associated with multiple “Low-hierarchy cell types”:
celltypist_df.set_index(["Cell Ontology ID", "Low-hierarchy cell types"]).head(10)
High-hierarchy cell types | Description | Curated markers | ||
---|---|---|---|---|
Cell Ontology ID | Low-hierarchy cell types | |||
CL:0000236 | B cells | B cells | B lymphocytes with diverse cell surface immuno... | CD79A, MS4A1, CD19 |
CL:0000843 | Follicular B cells | B cells | resting mature B lymphocytes found in the prim... | CXCR5, TNFRSF13B, CD22 |
CL:0000844 | Proliferative germinal center B cells | B cells | proliferating germinal center B cells | MKI67, SUGCT, AICDA |
Germinal center B cells | B cells | proliferating mature B cells that undergo soma... | POU2AF1, CD40, SUGCT | |
CL:0000787 | Memory B cells | B cells | long-lived mature B lymphocytes which are form... | CR2, CD27, MS4A1 |
Age-associated B cells | B cells | CD11c+ T-bet+ memory B cells associated with a... | FCRL2, ITGAX, TBX21 | |
CL:0000788 | Naive B cells | B cells | mature B lymphocytes which express cell-surfac... | IGHM, IGHD, TCL1A |
CL:0000818 | Transitional B cells | B cells | immature B cell precursors in the bone marrow ... | CD24, MYO1C, MS4A1 |
CL:0000817 | Large pre-B cells | B-cell lineage | proliferative B lymphocyte precursors derived ... | MME, CD24, MKI67 |
Small pre-B cells | B-cell lineage | non-proliferative B lymphocyte precursors deri... | MME, CD24, IGLL5 |
Inspect mapability with the public Cell Ontology#
For any cell type record that can be mapped against the public Cell Ontology, we’d like to ensure that it’s actually mapped.
This will avoid that we’ll refer to the same cell type with different identifiers.
Let’s see how well the Cell Typist reference data can be mapped.
All Celltypist labeled ontology IDs are mappable to the public Cell Ontology:
celltype_bt.inspect(celltypist_df["Cell Ontology ID"], celltype_bt.ontology_id);
✅ 68 terms (100.00%) are validated for ontology_id
However, when inspecting the names, most of them don’t match:
celltype_bt.inspect(celltypist_df["Low-hierarchy cell types"], celltype_bt.name);
✅ 1 term (1.00%) is validated for name
❗ 97 terms (99.00%) are not validated for name: B cells, Follicular B cells, Proliferative germinal center B cells, Germinal center B cells, Memory B cells, Age-associated B cells, Naive B cells, Transitional B cells, Large pre-B cells, Small pre-B cells, Pre-pro-B cells, Pro-B cells, Cycling B cells, Cycling DCs, Cycling gamma-delta T cells, Cycling monocytes, Cycling NK cells, Cycling T cells, DC, DC1, ...
💡 detected 9 terms with synonyms: DC1, DC2, ETP, CMP, ELP, GMP, ILC2, ILC3, pDC
💡 → standardize terms via .standardize()
A search tells us that terms that are named in plural in Cell Typist occur with a name in singular in the Cell Ontology:
celltypist_df["Low-hierarchy cell types"][0]
'B cells'
celltype_bt.search(celltypist_df["Low-hierarchy cell types"][0]).head()
ontology_id | definition | synonyms | parents | __agg__ | __ratio__ | |
---|---|---|---|---|---|---|
name | ||||||
B cell | CL:0000236 | A Lymphocyte Of B Lineage That Is Capable Of B... | B-cell|B lymphocyte|B-lymphocyte | [CL:0000945] | b cell | 92.307692 |
B-1 B cell | CL:0000819 | A B Cell Of Distinct Lineage And Surface Marke... | B1 B cell|B-1 B lymphocyte|B1 cell|B-1 B-cell|... | [CL:0000785] | b-1 b cell | 85.714286 |
B-2 B cell | CL:0000822 | A Conventional B Cell Subject To Antigenic Sti... | B2 B cell|B2 B-lymphocyte|B2 B lymphocyte|B-2 ... | [CL:0000785] | b-2 b cell | 85.714286 |
arachnoid barrier cell | CL:4023097 | A Mesothelial Fibroblast Of The Arachnoid Barr... | ABC|AB cell | [CL:4023054] | arachnoid barrier cell | 85.714286 |
Be cell | CL:0000968 | A Mature B Cell That Produces Cytokines That C... | B effector cell|effector B-cell|effector B cel... | [CL:0000785] | be cell | 85.714286 |
Let’s try to strip "s"
and inspect if more names are mappable. Yes, there are!
celltype_bt.inspect(
[i.rstrip("s") for i in celltypist_df["Low-hierarchy cell types"]],
celltype_bt.name,
);
✅ 5 terms (5.10%) are validated for name
❗ 93 terms (94.90%) are not validated for name: Follicular B cell, Proliferative germinal center B cell, Germinal center B cell, Memory B cell, Age-associated B cell, Naive B cell, Transitional B cell, Large pre-B cell, Small pre-B cell, Pre-pro-B cell, Pro-B cell, Cycling B cell, Cycling DC, Cycling gamma-delta T cell, Cycling monocyte, Cycling NK cell, Cycling T cell, DC, DC1, DC2, ...
💡 detected 34 terms with inconsistent casing/synonyms: Follicular B cell, Germinal center B cell, Memory B cell, Naive B cell, Transitional B cell, Small pre-B cell, Pro-B cell, DC1, DC2, Endothelial cell, Epithelial cell, Erythrocyte, ETP, Fibroblast, Granulocyte, Neutrophil, CMP, ELP, GMP, ILC2, ...
💡 → standardize terms via .standardize()
Every “low-hierarchy cell type” has an ontology id and most “high-hierarchy cell types” also appear as “low-hierarchy cell types” in the Cell Typist table. Four, however, don’t, and therefore don’t have an ontology ID.
high_terms = celltypist_df["High-hierarchy cell types"].unique()
low_terms = celltypist_df["Low-hierarchy cell types"].unique()
high_terms_umapped = set(high_terms).difference(low_terms)
high_terms_umapped
{'B-cell lineage', 'Cycling cells', 'Erythroid', 'T cells'}
Register CellTypist records in LaminDB#
Let’s first add the “High-hierarchy cell types” as a column "parent"
.
This enables LaminDB to populate the parents
and children
fields, which will enable you to query for hierarchical relationships.
celltypist_df["parent"] = celltypist_df.pop("High-hierarchy cell types")
# if high and low terms are the same, no parents
celltypist_df.loc[
(celltypist_df["parent"] == celltypist_df["Low-hierarchy cell types"]), "parent"
] = None
# rename columns, drop markers
celltypist_df.drop(columns=["Curated markers"], inplace=True)
celltypist_df.rename(
columns={"Low-hierarchy cell types": "name", "Cell Ontology ID": "ontology_id"},
inplace=True,
)
celltypist_df.columns = celltypist_df.columns.str.lower()
celltypist_df.head(2)
name | description | ontology_id | parent | |
---|---|---|---|---|
0 | B cells | B lymphocytes with diverse cell surface immuno... | CL:0000236 | None |
1 | Follicular B cells | resting mature B lymphocytes found in the prim... | CL:0000843 | B cells |
Now, let’s create records from the public ontology:
public_records = lb.CellType.from_values(
celltypist_df.ontology_id, lb.CellType.ontology_id
)
✅ created 68 CellType records from Bionty matching ontology_id: CL:0000236, CL:0000843, CL:0000844, CL:0000787, CL:0000788, CL:0000818, CL:0000817, CL:0002046, CL:0000826, CL:0001056, CL:0000798, CL:0000576, CL:0000623, CL:0000084, CL:0000990, CL:0000840, CL:0001029, CL:0002489, CL:0000809, CL:0000553, ...
Let’s now amend public ontology records so that they maintain additional annotations that Cell Typist might have.
records_names = {}
public_records_dict = {r.ontology_id: r for r in public_records}
for _, row in celltypist_df.iterrows():
name = row["name"]
ontology_id = row["ontology_id"]
public_record = public_records_dict[ontology_id]
# if both name and ontology_id match public record, use public record
if name.lower() == public_record.name.lower():
records_names[name] = public_record
continue
else: # when ontology_id matches the public record and name doesn't match
# if singular form of the Celltypist name matches public name
if name.lower().rstrip("s") == public_record.name.lower():
# add the Celltypist name to the synonyms of the public ontology record
public_record.add_synonym(name)
records_names[name] = public_record
continue
if public_record.synonyms is not None:
synonyms = [s.lower() for s in public_record.synonyms.split("|")]
# if any of the public matches celltypist name
if any(
[
i.lower() in {name.lower(), name.lower().rstrip("s")}
for i in synonyms
]
):
# add the Celltypist name to the synonyms of the public ontology record
public_record.add_synonym(name)
records_names[name] = public_record
continue
# create a record only based on Celltypist metadata
records_names[name] = lb.CellType(
name=name, ontology_id=ontology_id, description=row.description
)
You can see certain records are created by adding the Celltypist name to the synonyms of the public record:
records_names["GMP"]
CellType(id='f5eAsw0p', name='granulocyte monocyte progenitor cell', ontology_id='CL:0000557', synonyms='CFU-GM|granulocyte/monocyte precursor|granulocyte/monocyte progenitor|GMP|colony forming unit granulocyte macrophage|granulocyte-macrophage progenitor', description='A Hematopoietic Progenitor Cell That Is Committed To The Granulocyte And Monocyte Lineages. These Cells Are Cd123-Positive, And Do Not Express Gata1 Or Gata2 But Do Express C/Ebpa, And Pu.1.', bionty_source_id='Eukz', created_by_id='DzTjkKse')
Other records are created based on Celltypist metadata:
records_names["Age-associated B cells"]
CellType(id='00ieV0IG', name='Age-associated B cells', ontology_id='CL:0000787', description='CD11c+ T-bet+ memory B cells associated with autoimmunity and aging', created_by_id='DzTjkKse')
Let’s save them to our database:
records = set(records_names.values())
ln.save(records)
Show code cell output
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='b5k0suF0', name='erythrocyte', ontology_id='CL:0000232', synonyms='red blood cell|Erythrocytes|RBC', description='A Red Blood Cell. In Mammals, Mature Erythrocytes Are Biconcave Disks Containing Hemoglobin Whose Function Is To Transport Oxygen.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='Q2BH279Q', name='classical monocyte', ontology_id='CL:0000860', synonyms='Classical monocytes|inflammatory monocyte', description='A Monocyte That Responds Rapidly To Microbial Stimuli By Secreting Cytokines And Antimicrobial Factors And Which Is Characterized By High Expression Of Ccr2 In Both Rodents And Humans, Negative For The Lineage Markers Cd3, Cd19, And Cd20, And Of Larger Size Than Non-Classical Monocytes.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='3rJgLble', name='conventional dendritic cell', ontology_id='CL:0000990', synonyms='DC1|dendritic reticular cell|type 1 DC|cDC', description='Conventional Dendritic Cell Is A Dendritic Cell That Is Cd11C-High.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000451
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='9JGbXeUA', name='dendritic cell', ontology_id='CL:0000451', description='A Cell Of Hematopoietic Origin, Typically Resident In Particular Tissues, Specialized In The Uptake, Processing, And Transport Of Antigens To Lymph Nodes For The Purpose Of Stimulating An Immune Response Via T Cell Activation. These Cells Are Lineage Negative (Cd3-Negative, Cd19-Negative, Cd34-Negative, And Cd56-Negative).', updated_at=2023-08-28 18:24:21, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000738
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='MkrH0gsX', name='leukocyte', ontology_id='CL:0000738', synonyms='white blood cell|leucocyte', description='An Achromatic Cell Of The Myeloid Or Lymphoid Lineages Capable Of Ameboid Movement, Found In Blood Or Other Tissue.', updated_at=2023-08-28 18:24:22, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000988
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='Q0aQr5JB', name='hematopoietic cell', ontology_id='CL:0000988', synonyms='haematopoietic cell|hemopoietic cell|haemopoietic cell', description='A Cell Of A Hematopoietic Lineage.', updated_at=2023-08-28 18:24:23, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 2 CellType records from Bionty matching ontology_id: CL:0002371, CL:0000548
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='QMAH6IlS', name='somatic cell', ontology_id='CL:0002371', description='A Cell Of An Organism That Does Not Pass On Its Genetic Material To The Organism'S Offspring (I.E. A Non-Germ Line Cell).', updated_at=2023-08-28 18:24:24, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ loaded 1 CellType record matching ontology_id: CL:0000548
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000003
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='VT73gpK2', name='native cell', ontology_id='CL:0000003', description='A Cell That Is Found In A Natural Setting, Which Includes Multicellular Organism Cells 'In Vivo' (I.E. Part Of An Organism), And Unicellular Organisms 'In Environment' (I.E. Part Of A Natural Environment).', updated_at=2023-08-28 18:24:25, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000000
💡 also saving parents of CellType(id='H0taCt24', name='animal cell', ontology_id='CL:0000548', synonyms='metazoan cell', description='A Native Cell That Is Part Of Some Metazoa.', updated_at=2023-08-28 18:24:24, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000255
💡 also saving parents of CellType(id='uMLhrmbZ', name='germinal center B cell', ontology_id='CL:0000844', synonyms='GC B cell|germinal center B lymphocyte|germinal center B-lymphocyte|GC B-cell|GC B lymphocyte|Germinal center B cells|germinal center B-cell|GC B-lymphocyte', description='A Rapidly Cycling Mature B Cell That Has Distinct Phenotypic Characteristics And Is Involved In T-Dependent Immune Responses And Located Typically In The Germinal Centers Of Lymph Nodes. This Cell Type Expresses Ly77 After Activation.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000785
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='0I51jgPp', name='mature B cell', ontology_id='CL:0000785', synonyms='mature B lymphocyte|mature B-cell|mature B-lymphocyte', description='A B Cell That Is Mature, Having Left The Bone Marrow. Initially, These Cells Are Igm-Positive And Igd-Positive, And They Can Be Activated By Antigen.', updated_at=2023-08-28 18:24:28, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0001201
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='CIS4VJI0', name='B cell, CD19-positive', ontology_id='CL:0001201', synonyms='CD19+ B cell|B lymphocyte, CD19-positive|B-lymphocyte, CD19-positive|CD19-positive B cell|B-cell, CD19-positive', description='A B Cell That Is Cd19-Positive.', updated_at=2023-08-28 18:24:29, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='nd6Qaf38', name='Hofbauer cell', ontology_id='CL:3000001', synonyms='Hofbauer cells', description='Oval Eosinophilic Histiocytes With Granules And Vacuoles Found In Placenta, Which Are Of Mesenchymal Origin, In Mesoderm Of The Chorionic Villus, Particularly Numerous In Early Pregnancy.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='NJ07Q1hX', name='plasmablast', ontology_id='CL:0000980', synonyms='CD27-positive, CD38-positive, CD20-negative B cell|Plasmablasts', description='An Activated Mature (Naive Or Memory) B Cell That Is Secreting Immunoglobulin, Typified By Being Cd27-Positive, Cd38-Positive, Cd138-Negative.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='RSjQq98q', name='pro-B cell', ontology_id='CL:0000826', synonyms='progenitor B lymphocyte|progenitor B cell|pro-B lymphocyte|progenitor B-lymphocyte|Pro-B cells|progenitor B-cell|pro-B-cell|pro-B-lymphocyte', description='A Progenitor Cell Of The B Cell Lineage, With Some Lineage Specific Activity Such As Early Stages Of Recombination Of B Cell Receptor Genes, But Not Yet Fully Committed To The B Cell Lineage Until The Expression Of Pax5 Occurs.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000838
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='vQ9N8BKH', name='lymphoid lineage restricted progenitor cell', ontology_id='CL:0000838', description='A Progenitor Cell Restricted To The Lymphoid Lineage.', updated_at=2023-08-28 18:24:30, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0002031
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='xl6RtpG9', name='hematopoietic lineage restricted progenitor cell', ontology_id='CL:0002031', description='A Hematopoietic Progenitor Cell That Is Capable Of Developing Into Only One Lineage Of Hematopoietic Cells.', updated_at=2023-08-28 18:24:31, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ loaded 1 CellType record matching ontology_id: CL:0000988
✅ created 1 CellType record from Bionty matching ontology_id: CL:0008001
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='0d3ym06W', name='hematopoietic precursor cell', ontology_id='CL:0008001', description='Any Hematopoietic Cell That Is A Precursor Of Some Other Hematopoietic Cell Type.', updated_at=2023-08-28 18:24:32, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='f5eAsw0p', name='granulocyte monocyte progenitor cell', ontology_id='CL:0000557', synonyms='CFU-GM|granulocyte/monocyte precursor|granulocyte/monocyte progenitor|GMP|colony forming unit granulocyte macrophage|granulocyte-macrophage progenitor', description='A Hematopoietic Progenitor Cell That Is Committed To The Granulocyte And Monocyte Lineages. These Cells Are Cd123-Positive, And Do Not Express Gata1 Or Gata2 But Do Express C/Ebpa, And Pu.1.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0002032
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='JYQl3RX8', name='hematopoietic oligopotent progenitor cell', ontology_id='CL:0002032', description='A Hematopoietic Oligopotent Progenitor Cell That Has The Ability To Differentiate Into Limited Cell Types But Lacks Lineage Cell Markers And Self Renewal Capabilities.', updated_at=2023-08-28 18:24:33, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='xfxkvliE', name='granulocyte', ontology_id='CL:0000094', synonyms='granular leukocyte|polymorphonuclear leukocyte|granular leucocyte|Granulocytes', description='A Leukocyte With Abundant Granules In The Cytoplasm.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000766
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='40onq0tm', name='myeloid leukocyte', ontology_id='CL:0000766', description='A Cell Of The Monocyte, Granulocyte, Or Mast Cell Lineage.', updated_at=2023-08-28 18:24:34, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='l0R9X3Bs', name='promyelocyte', ontology_id='CL:0000836', synonyms='Promyelocytes', description='A Precursor In The Granulocytic Series, Being A Cell Intermediate In Development Between A Myeloblast And Myelocyte, That Has Distinct Nucleoli, A Nuclear-To-Cytoplasmic Ratio Of 5:1 To 3:1, And Containing A Few Primary Cytoplasmic Granules. Markers For This Cell Are Fucosyltransferase Fut4-Positive, Cd33-Positive, Integrin Alpha-M-Negative, Low Affinity Immunoglobulin Gamma Fc Region Receptor Iii-Negative, And Cd24-Negative.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0002191
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='odstSt5D', name='granulocytopoietic cell', ontology_id='CL:0002191', description='A Cell Involved In The Formation Of A Granulocyte.', updated_at=2023-08-28 18:24:35, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000839
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='Q0v4wWyZ', name='myeloid lineage restricted progenitor cell', ontology_id='CL:0000839', description='A Progenitor Cell Restricted To The Myeloid Lineage.', updated_at=2023-08-28 18:24:36, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='loo3Xanl', name='common myeloid progenitor', ontology_id='CL:0000049', synonyms='CMP|common myeloid precursor', description='A Progenitor Cell Committed To Myeloid Lineage, Including The Megakaryocyte And Erythroid Lineages.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='0JvRwfVm', name='plasma cell', ontology_id='CL:0000786', synonyms='plasma B cell|plasmacyte|plasma B-cell|plasmocyte|Plasma cells', description='A Terminally Differentiated, Post-Mitotic, Antibody Secreting Cell Of The B Cell Lineage With The Phenotype Cd138-Positive, Surface Immunonoglobulin-Negative, And Mhc Class Ii-Negative. Plasma Cells Are Oval Or Round With Extensive Rough Endoplasmic Reticulum, A Well-Developed Golgi Apparatus, And A Round Nucleus Having A Characteristic Cartwheel Heterochromatin Pattern And Are Devoted To Producing Large Amounts Of Immunoglobulin.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000946
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='hFMWJcWc', name='antibody secreting cell', ontology_id='CL:0000946', description='A Lymphocyte Of B Lineage That Is Devoted To Secreting Large Amounts Of Immunoglobulin.', updated_at=2023-08-28 18:24:37, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000945
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='Z0yFV7vU', name='lymphocyte of B lineage', ontology_id='CL:0000945', description='A Lymphocyte Of B Lineage With The Commitment To Express An Immunoglobulin Complex.', updated_at=2023-08-28 18:24:38, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000542
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='g8slxY8X', name='lymphocyte', ontology_id='CL:0000542', description='A Lymphocyte Is A Leukocyte Commonly Found In The Blood And Lymph That Has The Characteristics Of A Large Nucleus, A Neutral Staining Cytoplasm, And Prominent Heterochromatin.', updated_at=2023-08-28 18:24:39, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='4fOuOYtl', name='endothelial cell', ontology_id='CL:0000115', synonyms='Endothelial cells|endotheliocyte', description='An Endothelial Cell Comprises The Outermost Layer Or Lining Of Anatomical Structures And Can Be Squamous Or Cuboidal. In Mammals, Endothelial Cell Has Vimentin Filaments And Is Derived From The Mesoderm.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 2 CellType records from Bionty matching ontology_id: CL:0000213, CL:0002078
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='nGEtVlKq', name='meso-epithelial cell', ontology_id='CL:0002078', synonyms='epithelial mesenchymal cell', description='Epithelial Cell Derived From Mesoderm Or Mesenchyme.', updated_at=2023-08-28 18:24:40, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='AHV57RuN', name='lining cell', ontology_id='CL:0000213', synonyms='boundary cell', description='A Cell Within An Epithelial Cell Sheet Whose Main Function Is To Act As An Internal Or External Covering For A Tissue Or An Organism.', updated_at=2023-08-28 18:24:40, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000215
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='gON03kRx', name='barrier cell', ontology_id='CL:0000215', description='A Cell Whose Primary Function Is To Prevent The Transport Of Stuff Across Compartments.', updated_at=2023-08-28 18:24:41, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='YzV7Qgmj', name='monocyte', ontology_id='CL:0000576', synonyms='Monocytes', description='Myeloid Mononuclear Recirculating Leukocyte That Can Act As A Precursor Of Tissue Macrophages, Osteoclasts And Some Populations Of Tissue Dendritic Cells.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='ppLUhJWx', name='non-classical monocyte', ontology_id='CL:0000875', synonyms='resident monocyte|Non-classical monocytes|patrolling monocyte', description='A Type Of Monocyte Characterized By Low Expression Of Ccr2, Low Responsiveness To Monocyte Chemoattractant Ccl2/Mcp1, Low Phagocytic Activity, And Decrease Size Relative To Classical Monocytes, But Increased Co-Stimulatory Activity. May Also Play A Role In Tissue Repair.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='YN0gzDt3', name='Kupffer cell', ontology_id='CL:0000091', synonyms='von Kupffer cell|stellate cell of von Kupffer|macrophagocytus stellatus|hepatic macrophage|Kupffer cells|littoral cell of hepatic sinusoid|liver macrophage', description='A Tissue-Resident Macrophage Of The Reticuloendothelial System Found On The Luminal Surface Of The Hepatic Sinusoids Involved In Erythrocyte Clearance. Markers Include F4/80+, Cd11B-Low, Cd68-Positive, Sialoadhesin-Positive, Cd163/Srcr-Positive. Irregular, With Long Processes Including Lamellipodia Extending Into The Sinusoid Lumen, Have Flattened Nucleus With Cytoplasm Containing Characteristic Invaginations Of The Plasma Membrane (Vermiform Bodies); Lie Within The Sinusoid Lumen Attached To The Endothelial Surface; Derived From The Bone Marrow, Form A Major Part Of The Body'S Mononuclear Phagocyte System.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000864
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='pXnRaLwJ', name='tissue-resident macrophage', ontology_id='CL:0000864', synonyms='resting histiocyte|fixed macrophage', description='A Macrophage Constitutively Resident In A Particular Tissue Under Non-Inflammatory Conditions, And Capable Of Phagocytosing A Variety Of Extracellular Particulate Material, Including Immune Complexes, Microorganisms, And Dead Cells.', updated_at=2023-08-28 18:24:42, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='67zMsufW', name='memory B cell', ontology_id='CL:0000787', synonyms='memory B-lymphocyte|memory B lymphocyte|memory B-cell|Memory B cells', description='A Memory B Cell Is A Mature B Cell That Is Long-Lived, Readily Activated Upon Re-Encounter Of Its Antigenic Determinant, And Has Been Selected For Expression Of Higher Affinity Immunoglobulin. This Cell Type Has The Phenotype Cd19-Positive, Cd20-Positive, Mhc Class Ii-Positive, And Cd138-Negative.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='i20ionW5', name='mast cell', ontology_id='CL:0000097', synonyms='Mast cells|mastocyte|histaminocyte|labrocyte', description='A Cell That Is Found In Almost All Tissues Containing Numerous Basophilic Granules And Capable Of Releasing Large Amounts Of Histamine And Heparin Upon Activation. Progenitors Leave Bone Marrow And Mature In Connective And Mucosal Tissue. Mature Mast Cells Are Found In All Tissues, Except The Bloodstream. Their Phenotype Is Cd117-High, Cd123-Negative, Cd193-Positive, Cd200R3-Positive, And Fceri-High. Stem-Cell Factor (Kit-Ligand; Scf) Is The Main Controlling Signal Of Their Survival And Development.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ loaded 1 CellType record matching ontology_id: CL:0000766
✅ created 1 CellType record from Bionty matching ontology_id: CL:0002274
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='70wMh2r7', name='histamine secreting cell', ontology_id='CL:0002274', description='A Cell Type That Secretes Histamine.', updated_at=2023-08-28 18:24:43, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000457
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='TpKvGjqi', name='biogenic amine secreting cell', ontology_id='CL:0000457', updated_at=2023-08-28 18:24:44, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000151
💡 also saving parents of CellType(id='64kIG7So', name='gamma-delta T cell', ontology_id='CL:0000798', synonyms='gamma-delta T lymphocyte|gamma-delta T-cell|gamma-delta T cells|gamma-delta T-lymphocyte|gammadelta T cell', description='A T Cell That Expresses A Gamma-Delta T Cell Receptor Complex.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='ePsFBu6n', name='transitional stage B cell', ontology_id='CL:0000818', synonyms='transitional stage B-lymphocyte|transitional B cell|transitional stage B lymphocyte|transitional stage B-cell|Transitional B cells', description='An Immature B Cell Of An Intermediate Stage Between The Pre-B Cell Stage And The Mature Naive Stage With The Phenotype Surface Igm-Positive And Cd19-Positive, And Are Subject To The Process Of B Cell Selection. A Transitional B Cell Migrates From The Bone Marrow Into The Peripheral Circulation, And Then To The Spleen.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='TENASE93', name='alveolar macrophage', ontology_id='CL:0000583', synonyms='dust cell|Alveolar macrophages', description='A Tissue-Resident Macrophage Found In The Alveoli Of The Lungs. Ingests Small Inhaled Particles Resulting In Degradation And Presentation Of The Antigen To Immunocompetent Cells. Markers Include F4/80-Positive, Cd11B-/Low, Cd11C-Positive, Cd68-Positive, Sialoadhesin-Positive, Dectin-1-Positive, Mr-Positive, Cx3Cr1-Negative.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ loaded 1 CellType record matching ontology_id: CL:0000864
✅ created 1 CellType record from Bionty matching ontology_id: CL:1001603
💡 also saving parents of CellType(id='X458vtJX', name='naive B cell', ontology_id='CL:0000788', synonyms='naive B-lymphocyte|naive B lymphocyte|naive B-cell|Naive B cells', description='A Naive B Cell Is A Mature B Cell That Has The Phenotype Surface Igd-Positive, Surface Igm-Positive, Cd20-Positive, Cd27-Negative And That Has Not Yet Been Activated By Antigen In The Periphery.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='XjG8T0GY', name='fibroblast', ontology_id='CL:0000057', synonyms='Fibroblasts', description='A Connective Tissue Cell Which Secretes An Extracellular Matrix Rich In Collagen And Other Macromolecules. Flattened And Irregular In Outline With Branching Processes; Appear Fusiform Or Spindle-Shaped.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0002320
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='4zAzIMBQ', name='connective tissue cell', ontology_id='CL:0002320', description='A Cell Of The Supporting Or Framework Tissue Of The Body, Arising Chiefly From The Embryonic Mesoderm And Including Adipose Tissue, Cartilage, And Bone.', updated_at=2023-08-28 18:24:47, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='P6E7yrc7', name='epithelial cell', ontology_id='CL:0000066', synonyms='Epithelial cells|epitheliocyte', description='A Cell That Is Usually Found In A Two-Dimensional Sheet With A Free Surface. The Cell Has A Cytoskeleton That Allows For Tight Cell To Cell Contact And For Cell Polarity Where Apical Part Is Directed Towards The Lumen And The Basal Part To The Basal Lamina.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='8lTrmDbK', name='neutrophil', ontology_id='CL:0000775', synonyms='neutrocyte|neutrophilic leukocyte|neutrophil leukocyte|Neutrophils|neutrophil leucocyte|neutrophilic leucocyte', description='Any Of The Immature Or Mature Forms Of A Granular Leukocyte That In Its Mature Form Has A Nucleus With Three To Five Lobes Connected By Slender Threads Of Chromatin, And Cytoplasm Containing Fine Inconspicuous Granules And Stainable By Neutral Dyes.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='S4Urkinl', name='macrophage', ontology_id='CL:0000235', synonyms='histiocyte|Macrophages', description='A Mononuclear Phagocyte Present In Variety Of Tissues, Typically Differentiated From Monocytes, Capable Of Phagocytosing A Variety Of Extracellular Particulate Material, Including Immune Complexes, Microorganisms, And Dead Cells.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='FMTngXKK', name='follicular B cell', ontology_id='CL:0000843', synonyms='follicular B lymphocyte|Fo B cell|follicular B-lymphocyte|Follicular B cells|Fo B-cell|follicular B-cell', description='A Resting Mature B Cell That Has The Phenotype Igm-Positive, Igd-Positive, Cd23-Positive And Cd21-Positive, And Found In The B Cell Follicles Of The White Pulp Of The Spleen Or The Corticol Areas Of The Peripheral Lymph Nodes. This Cell Type Is Also Described As Being Cd19-Positive, B220-Positive, Aa4-Negative, Cd43-Negative, And Cd5-Negative.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='Iywg7lUq', name='early lymphoid progenitor', ontology_id='CL:0000936', synonyms='LMPP|lymphoid-primed multipotent progenitor|ELP', description='A Lymphoid Progenitor Cell That Is Found In Bone Marrow, Gives Rise To B Cells, T Cells, Natural Killer Cells And Dendritic Cells, And Has The Phenotype Lin-Negative, Kit-Positive, Sca-1-Positive, Flt3-Positive, Cd34-Positive, Cd150 Negative, And Glya-Negative.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='cx8VcggA', name='B cell', ontology_id='CL:0000236', synonyms='B-cell|B lymphocyte|B cells|B-lymphocyte', description='A Lymphocyte Of B Lineage That Is Capable Of B Cell Mediated Immunity.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='Z7uMAWUF', name='regulatory T cell', ontology_id='CL:0000815', synonyms='regulatory T-lymphocyte|regulatory T lymphocyte|Regulatory T cells|regulatory T-cell|Treg', description='A T Cell Which Regulates Overall Immune Responses As Well As The Responses Of Other T Cell Subsets Through Direct Cell-Cell Contact And Cytokine Release.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0002419
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='2C5PhwrW', name='mature T cell', ontology_id='CL:0002419', synonyms='mature T-cell|CD3e-positive T cell', description='A T Cell That Expresses A T Cell Receptor Complex And Has Completed T Cell Selection.', updated_at=2023-08-28 18:24:48, bionty_source_id='Eukz', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='g2Rk2xkb', name='myelocyte', ontology_id='CL:0002193', synonyms='Myelocytes', description='A Cell Type That Is The First Of The Maturation Stages Of The Granulocytic Leukocytes Normally Found In The Bone Marrow. Granules Are Seen In The Cytoplasm. The Nuclear Material Of The Myelocyte Is Denser Than That Of The Myeloblast But Lacks A Definable Membrane. The Cell Is Flat And Contains Increasing Numbers Of Granules As Maturation Progresses.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
Add parent-child relationship of the records from Celltypist#
We still need to add the renaming 4 High hierarchy terms:
list(high_terms_umapped)
['B-cell lineage', 'Cycling cells', 'Erythroid', 'T cells']
Let’s get the top hits from a search:
for term in list(high_terms_umapped):
print(f"Term: {term}")
display(celltype_bt.search(term).head(1))
Term: B-cell lineage
ontology_id | definition | synonyms | parents | __agg__ | __ratio__ | |
---|---|---|---|---|---|---|
name | ||||||
obsolete cell by lineage | CL:0000220 | None | None | [] | obsolete cell by lineage | 73.684211 |
Term: Cycling cells
ontology_id | definition | synonyms | parents | __agg__ | __ratio__ | |
---|---|---|---|---|---|---|
name | ||||||
circulating cell | CL:0000080 | A Cell Which Moves Among Different Tissues Of ... | None | [] | circulating cell | 75.862069 |
Term: Erythroid
ontology_id | definition | synonyms | parents | __agg__ | __ratio__ | |
---|---|---|---|---|---|---|
name | ||||||
erythrocyte | CL:0000232 | A Red Blood Cell. In Mammals, Mature Erythrocy... | RBC|red blood cell | [CL:0000764] | erythrocyte | 70.0 |
Term: T cells
ontology_id | definition | synonyms | parents | __agg__ | __ratio__ | |
---|---|---|---|---|---|---|
name | ||||||
T cell | CL:0000084 | A Type Of Lymphocyte Whose Defining Characteri... | T-lymphocyte|T-cell|T lymphocyte | [CL:0000542] | t cell | 92.307692 |
So we decide to:
Add the “T cells” to the synonyms of the public “T cell” record
Create the remaining 3 terms only using their names (we think “B cell lineage” shouldn’t be identified with “B cell”)
for name in high_terms_umapped:
if name == "T cells":
record = lb.CellType.from_bionty(name="T cell")
record.add_synonym(name)
record.save()
else:
record = lb.CellType(name=name)
record.save()
records_names[name] = record
❗ records with similar names exist! did you mean to load one of them?
id | synonyms | __ratio__ | |
---|---|---|---|
name | |||
Cycling T cells | TTziQpub | 92.857143 | |
Cycling B cells | ibzfn1zQ | 92.857143 | |
Cycling NK cells | rC47wc9h | 89.655172 |
✅ created 1 CellType record from Bionty matching name: T cell
💡 also saving parents of CellType(id='BxNjby0x', name='T cell', ontology_id='CL:0000084', synonyms='T-cell|T-lymphocyte|T cells|T lymphocyte', description='A Type Of Lymphocyte Whose Defining Characteristic Is The Expression Of A T Cell Receptor Complex.', updated_at=2023-08-28 18:24:50, bionty_source_id='Eukz', created_by_id='DzTjkKse')
Now let’s add the parent records:
for _, row in celltypist_df.iterrows():
record = records_names[row["name"]]
if row["parent"] is not None:
parent_record = records_names[row["parent"]]
record.parents.add(parent_record)
Access the in-house CellType registry#
The previously added CellTypist ontology registry is now available in LaminDB.
To retrieve the full ontology table as a Pandas DataFrame we can use .filter
:
lb.CellType.filter().df()
name | ontology_id | abbr | synonyms | description | bionty_source_id | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|
id | ||||||||
S5aADT0B | Cycling DCs | CL:0001056 | None | None | proliferating dendritic cells | None | 2023-08-28 18:24:20 | DzTjkKse |
b5k0suF0 | erythrocyte | CL:0000232 | None | red blood cell|Erythrocytes|RBC | A Red Blood Cell. In Mammals, Mature Erythrocy... | Eukz | 2023-08-28 18:24:20 | DzTjkKse |
6KPuL5ry | Tcm/Naive cytotoxic T cells | CL:0000907 | None | None | CD8+ cytotoxic T lymphocytes mainly localized ... | None | 2023-08-28 18:24:20 | DzTjkKse |
iAssryci | Neutrophil-myeloid progenitor | CL:0000834 | None | None | progenitors of neutrophils and myeloid cells w... | None | 2023-08-28 18:24:20 | DzTjkKse |
3rJgLble | conventional dendritic cell | CL:0000990 | None | DC1|dendritic reticular cell|type 1 DC|cDC | Conventional Dendritic Cell Is A Dendritic Cel... | Eukz | 2023-08-28 18:24:20 | DzTjkKse |
... | ... | ... | ... | ... | ... | ... | ... | ... |
2C5PhwrW | mature T cell | CL:0002419 | None | mature T-cell|CD3e-positive T cell | A T Cell That Expresses A T Cell Receptor Comp... | Eukz | 2023-08-28 18:24:48 | DzTjkKse |
gjDWjE5c | B-cell lineage | None | None | None | None | None | 2023-08-28 18:24:48 | DzTjkKse |
TCaRmxoM | Cycling cells | None | None | None | None | None | 2023-08-28 18:24:48 | DzTjkKse |
pYLD8U0A | Erythroid | None | None | None | None | None | 2023-08-28 18:24:48 | DzTjkKse |
BxNjby0x | T cell | CL:0000084 | None | T-cell|T-lymphocyte|T cells|T lymphocyte | A Type Of Lymphocyte Whose Defining Characteri... | Eukz | 2023-08-28 18:24:50 | DzTjkKse |
132 rows × 8 columns
This enables us to look for cell types by creating a lookup object from our new CellType
registry.
db_lookup = lb.CellType.lookup()
db_lookup.memory_b_cell
CellType(id='67zMsufW', name='memory B cell', ontology_id='CL:0000787', synonyms='memory B-lymphocyte|memory B lymphocyte|memory B-cell|Memory B cells', description='A Memory B Cell Is A Mature B Cell That Is Long-Lived, Readily Activated Upon Re-Encounter Of Its Antigenic Determinant, And Has Been Selected For Expression Of Higher Affinity Immunoglobulin. This Cell Type Has The Phenotype Cd19-Positive, Cd20-Positive, Mhc Class Ii-Positive, And Cd138-Negative.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')
See cell type hierarchy:
db_lookup.memory_b_cell.view_parents()
Access parents of a record:
db_lookup.memory_b_cell.parents.all()
<QuerySet [CellType(id='0I51jgPp', name='mature B cell', ontology_id='CL:0000785', synonyms='mature B lymphocyte|mature B-cell|mature B-lymphocyte', description='A B Cell That Is Mature, Having Left The Bone Marrow. Initially, These Cells Are Igm-Positive And Igd-Positive, And They Can Be Activated By Antigen.', updated_at=2023-08-28 18:24:28, bionty_source_id='Eukz', created_by_id='DzTjkKse'), CellType(id='cx8VcggA', name='B cell', ontology_id='CL:0000236', synonyms='B-cell|B lymphocyte|B cells|B-lymphocyte', description='A Lymphocyte Of B Lineage That Is Capable Of B Cell Mediated Immunity.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')]>
db_lookup.memory_b_cell.parents.all()[1].parents.all()
<QuerySet [CellType(id='Z0yFV7vU', name='lymphocyte of B lineage', ontology_id='CL:0000945', description='A Lymphocyte Of B Lineage With The Commitment To Express An Immunoglobulin Complex.', updated_at=2023-08-28 18:24:38, bionty_source_id='Eukz', created_by_id='DzTjkKse')]>
Annotate a dataset with cell types using CellTypist#
Annotate cell types predicted with CellTypist#
We now demonstrate how simple it is to predict and add cell types to LaminDB with CellTypist. Our dataset of choice is a simple sample dataset together with a sample model.
input_file = celltypist.samples.get_sample_csv()
input_file
'/opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/celltypist/data/samples/sample_cell_by_gene.csv'
predictions = celltypist.annotate(
input_file, model="Immune_All_Low.pkl", majority_voting=True
)
🔎 No available models. Downloading...
📜 Retrieving model list from server https://celltypist.cog.sanger.ac.uk/models/models.json
📚 Total models in list: 31
📂 Storing models in /home/runner/.celltypist/data/models
💾 Downloading model [1/31]: Immune_All_Low.pkl
💾 Downloading model [2/31]: Immune_All_High.pkl
💾 Downloading model [3/31]: Adult_CynomolgusMacaque_Hippocampus.pkl
💾 Downloading model [4/31]: Adult_Mouse_Gut.pkl
💾 Downloading model [5/31]: Adult_Mouse_OlfactoryBulb.pkl
💾 Downloading model [6/31]: Adult_Pig_Hippocampus.pkl
💾 Downloading model [7/31]: Adult_RhesusMacaque_Hippocampus.pkl
💾 Downloading model [8/31]: Autopsy_COVID19_Lung.pkl
💾 Downloading model [9/31]: COVID19_HumanChallenge_Blood.pkl
💾 Downloading model [10/31]: COVID19_Immune_Landscape.pkl
💾 Downloading model [11/31]: Cells_Fetal_Lung.pkl
💾 Downloading model [12/31]: Cells_Intestinal_Tract.pkl
💾 Downloading model [13/31]: Cells_Lung_Airway.pkl
💾 Downloading model [14/31]: Developing_Human_Brain.pkl
💾 Downloading model [15/31]: Developing_Human_Hippocampus.pkl
💾 Downloading model [16/31]: Developing_Human_Thymus.pkl
💾 Downloading model [17/31]: Developing_Mouse_Brain.pkl
💾 Downloading model [18/31]: Developing_Mouse_Hippocampus.pkl
💾 Downloading model [19/31]: Healthy_COVID19_PBMC.pkl
💾 Downloading model [20/31]: Healthy_Mouse_Liver.pkl
💾 Downloading model [21/31]: Human_AdultAged_Hippocampus.pkl
💾 Downloading model [22/31]: Human_IPF_Lung.pkl
💾 Downloading model [23/31]: Human_Longitudinal_Hippocampus.pkl
💾 Downloading model [24/31]: Human_Lung_Atlas.pkl
💾 Downloading model [25/31]: Human_PF_Lung.pkl
💾 Downloading model [26/31]: Lethal_COVID19_Lung.pkl
💾 Downloading model [27/31]: Mouse_Dentate_Gyrus.pkl
💾 Downloading model [28/31]: Mouse_Isocortex_Hippocampus.pkl
💾 Downloading model [29/31]: Mouse_Postnatal_DentateGyrus.pkl
💾 Downloading model [30/31]: Nuclei_Lung_Airway.pkl
💾 Downloading model [31/31]: Pan_Fetal_Human.pkl
📁 Input file is '/opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/celltypist/data/samples/sample_cell_by_gene.csv'
⏳ Loading data
🔬 Input data has 559 cells and 32786 genes
🔗 Matching reference genes in the model
🧬 5313 features used for prediction
⚖️ Scaling input data
🖋️ Predicting labels
✅ Prediction done!
👀 Can not detect a neighborhood graph, will construct one before the over-clustering
⛓️ Over-clustering input data with resolution set to 5
🗳️ Majority voting the predictions
✅ Majority voting done!
Now that we’ve predicted all cell types we create an Anndata object that we will eventually track with LaminDB.
adata_annotated = predictions.to_adata()
adata_annotated.obs
predicted_labels | over_clustering | majority_voting | conf_score | |
---|---|---|---|---|
Cell_1 | Intermediate macrophages | 3 | Age-associated B cells | 0.979577 |
Cell_2 | Trm cytotoxic T cells | 3 | Age-associated B cells | 0.073008 |
Cell_3 | pDC | 9 | Macrophages | 0.020744 |
Cell_4 | Follicular B cells | 36 | Age-associated B cells | 0.167273 |
Cell_5 | Trm cytotoxic T cells | 36 | Age-associated B cells | 0.430877 |
... | ... | ... | ... | ... |
Cell_555 | Alveolar macrophages | 5 | Alveolar macrophages | 0.152075 |
Cell_556 | Alveolar macrophages | 0 | Alveolar macrophages | 0.901491 |
Cell_557 | Tcm/Naive helper T cells | 5 | Alveolar macrophages | 0.092006 |
Cell_558 | Alveolar macrophages | 5 | Alveolar macrophages | 0.747148 |
Cell_559 | Alveolar macrophages | 0 | Alveolar macrophages | 0.060108 |
559 rows × 4 columns
Let’s rename the column with predictions:
adata_annotated.obs.rename(
columns={"predicted_labels": "cell_type_cell_typist"}, inplace=True
)
Create cell type records:
celltypes = lb.CellType.from_values(
adata_annotated.obs.cell_type_cell_typist, lb.CellType.name
)
✅ loaded 19 CellType records matching name: Intermediate macrophages, Trm cytotoxic T cells, pDC, Tcm/Naive helper T cells, T(agonist), Age-associated B cells, DC, Tem/Temra cytotoxic T cells, CD16- NK cells, Double-positive thymocytes, Tem/Effector helper T cells, CD16+ NK cells, NKT cells, Mono-mac, Type 17 helper T cells, MNP, Erythrophagocytic macrophages, DC2, Cycling T cells
✅ loaded 11 CellType records matching synonyms: Follicular B cells, Macrophages, B cells, Classical monocytes, Alveolar macrophages, Memory B cells, Regulatory T cells, Myelocytes, NK cells, Non-classical monocytes, Monocytes
celltypes[:2]
[CellType(id='00ieV0IG', name='Age-associated B cells', ontology_id='CL:0000787', description='CD11c+ T-bet+ memory B cells associated with autoimmunity and aging', updated_at=2023-08-28 18:24:20, created_by_id='DzTjkKse'),
CellType(id='cx8VcggA', name='B cell', ontology_id='CL:0000236', synonyms='B-cell|B lymphocyte|B cells|B-lymphocyte', description='A Lymphocyte Of B Lineage That Is Capable Of B Cell Mediated Immunity.', updated_at=2023-08-28 18:24:20, bionty_source_id='Eukz', created_by_id='DzTjkKse')]
Track the annotated dataset in LaminDB#
Create a file record of the AnnData object. We further define a name of the dataset for clarity that can also be queried for.
file_annotated = ln.File.from_anndata(
adata_annotated, description="Examplary CellTypist file", var_ref=lb.Gene.symbol
)
💡 file will be copied to default storage upon `save()` with key `None` ('.lamindb/Jzzlx0VuDYft2m57Yvyh.h5ad')
💡 parsing feature names of X stored in slot 'var'
💡 using global setting species = human
❗ 32786 terms (100.00%) are not validated for symbol: MIR1302-10, FAM138A, OR4F5, RP11-34P13.7, RP11-34P13.8, AL627309.1, RP11-34P13.14, RP11-34P13.9, AP006222.2, RP4-669L17.10, OR4F29, RP4-669L17.2, RP5-857K21.15, RP5-857K21.1, RP5-857K21.2, RP5-857K21.3, RP5-857K21.4, RP5-857K21.5, OR4F16, RP11-206L10.3, ...
❗ no validated features, skip creating feature set
💡 parsing feature names of slot 'obs'
❗ 4 terms (100.00%) are not validated for name: cell_type_cell_typist, over_clustering, majority_voting, conf_score
❗ no validated features, skip creating feature set
file_annotated.save()
✅ storing file 'Jzzlx0VuDYft2m57Yvyh' at '.lamindb/Jzzlx0VuDYft2m57Yvyh.h5ad'
ln.save(celltypes)
Add cell types as labels for a feature "cell_type_cell_typist"
:
ln.Feature(name="cell_type_cell_typist", type="category").save()
file_annotated.add_labels(celltypes, feature="cell_type_cell_typist")
✅ linked feature 'cell_type_cell_typist' to registry 'bionty.CellType'
✅ linked new feature 'cell_type_cell_typist' together with new feature set FeatureSet(id='S63NlSZTgdiWLlg9PYCi', n=1, registry='core.Feature', hash='PJUgerWI40pFa7h9KU-g', updated_at=2023-08-28 18:25:59, modality_id='5bi8vRdi', created_by_id='DzTjkKse')
file_annotated.describe()
💡 File(id='Jzzlx0VuDYft2m57Yvyh', key=None, suffix='.h5ad', accessor='AnnData', description='Examplary CellTypist file', version=None, size=75080752, hash='QUkIkv81c3AqS-wQB_qOZU', hash_type='sha1-fl', created_at=2023-08-28 18:25:59, updated_at=2023-08-28 18:25:59)
Provenance:
🗃️ storage: Storage(id='IORfHudU', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/celltypist', type='local', updated_at=2023-08-28 18:24:12, created_by_id='DzTjkKse')
💫 transform: Transform(id='s5mkN5NQ1ttIz8', name='Manage a cell type registry', short_name='celltypist', version='0', type=notebook, updated_at=2023-08-28 18:25:59, created_by_id='DzTjkKse')
👣 run: Run(id='pGJ6f2eOq2WMNDPEfB5D', run_at=2023-08-28 18:24:17, transform_id='s5mkN5NQ1ttIz8', created_by_id='DzTjkKse')
👤 created_by: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-08-28 18:24:12)
Features:
external:
🔗 cell_type_cell_typist (30, bionty.CellType): ['pDC', 'classical monocyte', 'Tem/Effector helper T cells', 'myelocyte', 'Cycling T cells']
file_annotated.view_lineage()
Now we can track the file and search for it for usecase by querying for a specific cell type.
ln.File.filter(cell_types=db_lookup.tcm_naive_helper_t_cells).df()
storage_id | key | suffix | accessor | description | version | initial_version_id | size | hash | hash_type | transform_id | run_id | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
Jzzlx0VuDYft2m57Yvyh | IORfHudU | None | .h5ad | AnnData | Examplary CellTypist file | None | None | 75080752 | QUkIkv81c3AqS-wQB_qOZU | sha1-fl | s5mkN5NQ1ttIz8 | pGJ6f2eOq2WMNDPEfB5D | 2023-08-28 18:25:59 | DzTjkKse |
Or track in which notebook the file is annotated by celltypist:
ln.Transform.filter(files__description__icontains="CellTypist").df()
name | short_name | version | initial_version_id | type | reference | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|
id | ||||||||
s5mkN5NQ1ttIz8 | Manage a cell type registry | celltypist | 0 | None | notebook | None | 2023-08-28 18:25:59 | DzTjkKse |
Try it yourself#
This notebook is available at laminlabs/lamin-usecases.
Show code cell content
!lamin delete --force celltypist
!rm -r ./celltypist
💡 deleting instance testuser1/celltypist
✅ deleted instance settings file: /home/runner/.lamin/instance--testuser1--celltypist.env
✅ instance cache deleted
✅ deleted '.lndb' sqlite file
❗ consider manually deleting your stored data: /home/runner/work/lamin-usecases/lamin-usecases/docs/celltypist