Samples#

The module samples provides functions related to Samples section of cBioPortal Web Public API.

pybioportal.samples.fetch_samples(sample_identifiers=None, sample_list_ids=None, unique_sample_keys=None, projection='SUMMARY')#

Fetch samples by ID.

Parameters:
  • sample_identifiers (list of dict) –

    List of Sample ID / Study ID pairs.

    Each dict should have the following format:

    sample_identifiers=[
    {“sample_ids”: [‘TCGA-AR-A1AR-01’,’TCGA-BH-A1EO-01’,’TCGA-BH-A1ES-01’],

    “study_id”: “brca_tcga”},

    {“sample_ids”: [‘TCGA-A2-A0T2-01’,’TCGA-A2-A04P-01’],

    “study_id”: “brca_tcga_pub”}

    ]

  • sample_list_ids (list of str) – List of Sample List IDs (e.g., [‘brca_tcga_cna’, ‘brca_tcga_mrna’, ‘brca_tcga_pub_cna’]).

  • unique_sample_keys (list of str) – List of Unique Sample Keys, e.g. [‘VENHQS1BUi1BMUFSLTAxOmJyY2FfdGNnYQ’, ‘VENHQS1CNi1BMElRLTAxOmJyY2FfdGNnYV9wdWI’, ‘VENHQS1CSC1BMUZELTAxOmJyY2FfdGNnYQ’]

  • projection (str) –

    Level of detail of the response.

    Possible values:

    • ”DETAILED”: Detailed information.

    • ”ID”: Information with only IDs.

    • ”META”: Metadata information.

    • ”SUMMARY”: Summary information (default).

Returns:

A DataFrame containing samples by ID.

Return type:

pandas.DataFrame

pybioportal.samples.get_all_samples_in_study(study_id, direction='ASC', pageNumber=0, pageSize=10000000, projection='SUMMARY', sortBy=None)#

Get all samples in a study.

Parameters:
  • study_id (str) – Study ID (e.g., “acc_tcga”).

  • direction (str) –

    Direction of the sort.

    Possible values:

    • ”ASC”: Ascending (default).

    • ”DESC”: Descending.

  • pageNumber (int) –

    Page number of the result list.

    • Minimum value is 0.

  • pageSize (int) –

    Page size of the result list.

    • Minimum value is 1, maximum value is 10000000.

  • projection (str) –

    Level of detail of the response.

    Possible values:

    • ”DETAILED”: Detailed information.

    • ”ID”: Information with only IDs.

    • ”META”: Metadata information.

    • ”SUMMARY”: Summary information (default).

  • sortBy (str) –

    Name of the property that the result list is sorted by.

    Possible values:

    • ”sampleId”: Sort by sample ID.

    • ”sampleType”: Sort by sample type.

Returns:

A DataFrame containing samples in the specified study.

Return type:

pandas.DataFrame

pybioportal.samples.get_all_samples_of_patient_in_study(study_id, patient_id, direction='ASC', pageNumber=0, pageSize=10000000, projection='SUMMARY', sortBy=None)#

Get all samples of a patient in a study.

Parameters:
  • study_id (str) – Study ID (e.g., “acc_tcga”).

  • patient_id (str) – Patient ID (e.g., “TCGA-OR-A5J2”).

  • direction (str) –

    Direction of the sort.

    Possible values:

    • ”ASC”: Ascending (default).

    • ”DESC”: Descending.

  • pageNumber (int) –

    Page number of the result list.

    • Minimum value is 0.

  • pageSize (int) –

    Page size of the result list.

    • Minimum value is 1, maximum value is 10000000.

  • projection (str) –

    Level of detail of the response.

    Possible values:

    • ”DETAILED”: Detailed information.

    • ”ID”: Information with only IDs.

    • ”META”: Metadata information.

    • ”SUMMARY”: Summary information (default).

  • sortBy (str) –

    Name of the property that the result list is sorted by.

    Possible values:

    • ”sampleId”: Sort by sample ID.

    • ”sampleType”: Sort by sample type.

Returns:

A DataFrame containing samples of the specified patient in the study.

Return type:

pandas.DataFrame

pybioportal.samples.get_sample_in_study(study_id, sample_id)#

Get information about a specific sample in a study.

Parameters:
  • study_id (str) – Study ID (e.g., “acc_tcga”).

  • sample_id (str) – Sample ID (e.g., “TCGA-OR-A5J2-01”).

Returns:

A DataFrame containing information about the specified sample.

Return type:

pandas.DataFrame

pybioportal.samples.get_samples_by_keyword(keyword=None, direction='ASC', pageNumber=0, pageSize=10000000, projection='SUMMARY', sortBy=None)#

Get all samples matching a keyword.

Parameters:
  • keyword (str) – Search keyword that applies to the study ID.

  • direction (str) –

    Direction of the sort.

    Possible values:

    • ”ASC”: Ascending (default).

    • ”DESC”: Descending.

  • pageNumber (int) –

    Page number of the result list.

    • Minimum value is 0.

  • pageSize (int) –

    Page size of the result list.

    • Minimum value is 1, maximum value is 10000000.

  • projection (str) –

    Level of detail of the response.

    Possible values:

    • ”DETAILED”: Detailed information.

    • ”ID”: Information with only IDs.

    • ”META”: Metadata information.

    • ”SUMMARY”: Summary information (default).

  • sortBy (str) –

    Name of the property that the result list is sorted by.

    Possible values:

    • ”sampleId”

    • ”sampleType”

Returns:

A DataFrame containing samples matching the keyword.

Return type:

pandas.DataFrame


Examples#

from pybioportal import samples as sp
df1 = sp.get_samples_by_keyword(keyword="TCGA")
df1
uniqueSampleKey uniquePatientKey sampleType sampleId patientId studyId
0 VENHQS0wMi0wMDAxLTAxOmdibV90Y2dhX3Bhbl9jYW5fYX... VENHQS0wMi0wMDAxOmdibV90Y2dhX3Bhbl9jYW5fYXRsYX... Primary Solid Tumor TCGA-02-0001-01 TCGA-02-0001 gbm_tcga_pan_can_atlas_2018
1 VENHQS0wMi0wMDAxLTAxOmxnZ2dibV90Y2dhX3B1Yg VENHQS0wMi0wMDAxOmxnZ2dibV90Y2dhX3B1Yg Primary Solid Tumor TCGA-02-0001-01 TCGA-02-0001 lgggbm_tcga_pub
2 VENHQS0wMi0wMDAxLTAxOmdibV90Y2dhX3B1Yg VENHQS0wMi0wMDAxOmdibV90Y2dhX3B1Yg Primary Solid Tumor TCGA-02-0001-01 TCGA-02-0001 gbm_tcga_pub
3 VENHQS0wMi0wMDAxLTAxOmdibV90Y2dhX3B1YjIwMTM VENHQS0wMi0wMDAxOmdibV90Y2dhX3B1YjIwMTM Primary Solid Tumor TCGA-02-0001-01 TCGA-02-0001 gbm_tcga_pub2013
4 VENHQS0wMi0wMDAxLTAxOmdibV90Y2dh VENHQS0wMi0wMDAxOmdibV90Y2dh Primary Solid Tumor TCGA-02-0001-01 TCGA-02-0001 gbm_tcga
... ... ... ... ... ... ...
33581 SURUQ0dBLTAyOm1peGVkX21za190Y2dhXzIwMjE SURUQ0dBLTAyOm1peGVkX21za190Y2dhXzIwMjE Primary Solid Tumor IDTCGA-02 IDTCGA-02 mixed_msk_tcga_2021
33582 SURUQ0dBLTAzOm1peGVkX21za190Y2dhXzIwMjE SURUQ0dBLTAzOm1peGVkX21za190Y2dhXzIwMjE Primary Solid Tumor IDTCGA-03 IDTCGA-03 mixed_msk_tcga_2021
33583 SURUQ0dBLTA0Om1peGVkX21za190Y2dhXzIwMjE SURUQ0dBLTA0Om1peGVkX21za190Y2dhXzIwMjE Primary Solid Tumor IDTCGA-04 IDTCGA-04 mixed_msk_tcga_2021
33584 SURUQ0dBLTA1Om1peGVkX21za190Y2dhXzIwMjE SURUQ0dBLTA1Om1peGVkX21za190Y2dhXzIwMjE Primary Solid Tumor IDTCGA-05 IDTCGA-05 mixed_msk_tcga_2021
33585 SURUQ0dBLTA2Om1peGVkX21za190Y2dhXzIwMjE SURUQ0dBLTA2Om1peGVkX21za190Y2dhXzIwMjE Primary Solid Tumor IDTCGA-06 IDTCGA-06 mixed_msk_tcga_2021

33586 rows × 6 columns

df2a = sp.fetch_samples(sample_identifiers=[
                                            {"sample_ids": ["TCGA-AR-A1AR-01","TCGA-BH-A1EO-01","TCGA-BH-A1ES-01"],
                                             "study_id": "brca_tcga"},
                                            {"sample_ids": ["TCGA-A2-A0T2-01","TCGA-A2-A04P-01"],
                                             "study_id": "brca_tcga_pub"}
                                            ])
df2a
uniqueSampleKey uniquePatientKey sampleType sampleId patientId studyId
0 VENHQS1BUi1BMUFSLTAxOmJyY2FfdGNnYQ VENHQS1BUi1BMUFSOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-AR-A1AR-01 TCGA-AR-A1AR brca_tcga
1 VENHQS1CSC1BMUVPLTAxOmJyY2FfdGNnYQ VENHQS1CSC1BMUVPOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-BH-A1EO-01 TCGA-BH-A1EO brca_tcga
2 VENHQS1CSC1BMUVTLTAxOmJyY2FfdGNnYQ VENHQS1CSC1BMUVTOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-BH-A1ES-01 TCGA-BH-A1ES brca_tcga
3 VENHQS1BMi1BMFQyLTAxOmJyY2FfdGNnYV9wdWI VENHQS1BMi1BMFQyOmJyY2FfdGNnYV9wdWI Primary Solid Tumor TCGA-A2-A0T2-01 TCGA-A2-A0T2 brca_tcga_pub
4 VENHQS1BMi1BMDRQLTAxOmJyY2FfdGNnYV9wdWI VENHQS1BMi1BMDRQOmJyY2FfdGNnYV9wdWI Primary Solid Tumor TCGA-A2-A04P-01 TCGA-A2-A04P brca_tcga_pub
df2b = sp.fetch_samples(sample_list_ids=["brca_tcga_cna", "brca_tcga_mrna", "brca_tcga_pub_cna"])
df2b
uniqueSampleKey uniquePatientKey sampleType sampleId patientId studyId
0 VENHQS1BUi1BMUFSLTAxOmJyY2FfdGNnYQ VENHQS1BUi1BMUFSOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-AR-A1AR-01 TCGA-AR-A1AR brca_tcga
1 VENHQS1CSC1BMUVPLTAxOmJyY2FfdGNnYQ VENHQS1CSC1BMUVPOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-BH-A1EO-01 TCGA-BH-A1EO brca_tcga
2 VENHQS1CSC1BMUVTLTAxOmJyY2FfdGNnYQ VENHQS1CSC1BMUVTOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-BH-A1ES-01 TCGA-BH-A1ES brca_tcga
3 VENHQS1CSC1BMUVULTAxOmJyY2FfdGNnYQ VENHQS1CSC1BMUVUOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-BH-A1ET-01 TCGA-BH-A1ET brca_tcga
4 VENHQS1CSC1BMUVVLTAxOmJyY2FfdGNnYQ VENHQS1CSC1BMUVVOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-BH-A1EU-01 TCGA-BH-A1EU brca_tcga
... ... ... ... ... ... ...
2382 VENHQS1BQy1BMkZGLTAxOmJyY2FfdGNnYV9wdWI VENHQS1BQy1BMkZGOmJyY2FfdGNnYV9wdWI Primary Solid Tumor TCGA-AC-A2FF-01 TCGA-AC-A2FF brca_tcga_pub
2383 VENHQS1BQy1BMkZCLTAxOmJyY2FfdGNnYV9wdWI VENHQS1BQy1BMkZCOmJyY2FfdGNnYV9wdWI Primary Solid Tumor TCGA-AC-A2FB-01 TCGA-AC-A2FB brca_tcga_pub
2384 VENHQS1BQy1BMkZHLTAxOmJyY2FfdGNnYV9wdWI VENHQS1BQy1BMkZHOmJyY2FfdGNnYV9wdWI Primary Solid Tumor TCGA-AC-A2FG-01 TCGA-AC-A2FG brca_tcga_pub
2385 VENHQS1HSS1BMkM4LTAxOmJyY2FfdGNnYV9wdWI VENHQS1HSS1BMkM4OmJyY2FfdGNnYV9wdWI Primary Solid Tumor TCGA-GI-A2C8-01 TCGA-GI-A2C8 brca_tcga_pub
2386 VENHQS1FOS1BMjk1LTAxOmJyY2FfdGNnYV9wdWI VENHQS1FOS1BMjk1OmJyY2FfdGNnYV9wdWI Primary Solid Tumor TCGA-E9-A295-01 TCGA-E9-A295 brca_tcga_pub

2387 rows × 6 columns

df2c = sp.fetch_samples(unique_sample_keys=["VENHQS1BUi1BMUFSLTAxOmJyY2FfdGNnYQ",
                                            "VENHQS1CNi1BMElRLTAxOmJyY2FfdGNnYV9wdWI",
                                            "VENHQS1CSC1BMUZELTAxOmJyY2FfdGNnYQ"])
df2c
uniqueSampleKey uniquePatientKey sampleType sampleId patientId studyId
0 VENHQS1BUi1BMUFSLTAxOmJyY2FfdGNnYQ VENHQS1BUi1BMUFSOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-AR-A1AR-01 TCGA-AR-A1AR brca_tcga
1 VENHQS1CSC1BMUZELTAxOmJyY2FfdGNnYQ VENHQS1CSC1BMUZEOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-BH-A1FD-01 TCGA-BH-A1FD brca_tcga
2 VENHQS1CNi1BMElRLTAxOmJyY2FfdGNnYV9wdWI VENHQS1CNi1BMElROmJyY2FfdGNnYV9wdWI Primary Solid Tumor TCGA-B6-A0IQ-01 TCGA-B6-A0IQ brca_tcga_pub
df3 = sp.get_all_samples_of_patient_in_study(study_id="brca_tcga", patient_id="TCGA-AR-A1AR")
df3
uniqueSampleKey uniquePatientKey sampleType sampleId patientId studyId
0 VENHQS1BUi1BMUFSLTAxOmJyY2FfdGNnYQ VENHQS1BUi1BMUFSOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-AR-A1AR-01 TCGA-AR-A1AR brca_tcga
df4 = sp.get_all_samples_in_study(study_id="brca_tcga")
df4
uniqueSampleKey uniquePatientKey sampleType sampleId patientId studyId
0 VENHQS1BUi1BMUFSLTAxOmJyY2FfdGNnYQ VENHQS1BUi1BMUFSOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-AR-A1AR-01 TCGA-AR-A1AR brca_tcga
1 VENHQS1CSC1BMUVPLTAxOmJyY2FfdGNnYQ VENHQS1CSC1BMUVPOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-BH-A1EO-01 TCGA-BH-A1EO brca_tcga
2 VENHQS1CSC1BMUVTLTAxOmJyY2FfdGNnYQ VENHQS1CSC1BMUVTOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-BH-A1ES-01 TCGA-BH-A1ES brca_tcga
3 VENHQS1CSC1BMUVTLTA2OmJyY2FfdGNnYQ VENHQS1CSC1BMUVTOmJyY2FfdGNnYQ Metastatic TCGA-BH-A1ES-06 TCGA-BH-A1ES brca_tcga
4 VENHQS1CSC1BMUVULTAxOmJyY2FfdGNnYQ VENHQS1CSC1BMUVUOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-BH-A1ET-01 TCGA-BH-A1ET brca_tcga
... ... ... ... ... ... ...
1103 VENHQS1FMi1BMUI0LTAxOmJyY2FfdGNnYQ VENHQS1FMi1BMUI0OmJyY2FfdGNnYQ Primary Solid Tumor TCGA-E2-A1B4-01 TCGA-E2-A1B4 brca_tcga
1104 VENHQS1FMi1BMUI1LTAxOmJyY2FfdGNnYQ VENHQS1FMi1BMUI1OmJyY2FfdGNnYQ Primary Solid Tumor TCGA-E2-A1B5-01 TCGA-E2-A1B5 brca_tcga
1105 VENHQS1FMi1BMUI2LTAxOmJyY2FfdGNnYQ VENHQS1FMi1BMUI2OmJyY2FfdGNnYQ Primary Solid Tumor TCGA-E2-A1B6-01 TCGA-E2-A1B6 brca_tcga
1106 VENHQS1FMi1BMUJDLTAxOmJyY2FfdGNnYQ VENHQS1FMi1BMUJDOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-E2-A1BC-01 TCGA-E2-A1BC brca_tcga
1107 VENHQS1FMi1BMUJELTAxOmJyY2FfdGNnYQ VENHQS1FMi1BMUJEOmJyY2FfdGNnYQ Primary Solid Tumor TCGA-E2-A1BD-01 TCGA-E2-A1BD brca_tcga

1108 rows × 6 columns

df5 = sp.get_sample_in_study(study_id="brca_tcga",sample_id="TCGA-AR-A1AR-01")
df5
sampleType sequenced copyNumberSegmentPresent sampleId patientId studyId
0 Primary Solid Tumor True True TCGA-AR-A1AR-01 TCGA-AR-A1AR brca_tcga