迭代两个字典,匹配键值并返回串联的值

时间:2020-10-19 14:07:07

标签: python parsing

我正在遍历两个字典,当找到键的匹配项时,我将两个字典的值连接起来,并创建一个url并将结果存储在新列表中。

database_dict是静态的,永远不会改变。但是,第二个字典cross_ref_dict基于我正在解析的文件中的值。 简而言之,database_dict总是比cross_ref_dict具有更多的值。

下面您将找到我当前的解决方案,当两个字典中的元素数量相同时,该方法可以正常工作。但是当它们不同时,我会收到一个空列表。我该如何处理这种情况,并仅返回在database_dict中找到的值的并置值? 我只想在键匹配的情况下连接值,如果不匹配或找不到值,我希望什么都不会返回。

database_dict = { 'CLO' : 'https://www.ebi.ac.uk/ols/ontologies/clo/terms?iri=http://purl.obolibrary.org/obo/',
             'EFO' : ' https://www.ebi.ac.uk/efo/',
             'ArrayExpress' : 'https://www.ebi.ac.uk/arrayexpress/experiments/',
             'ATCC' : 'https://www.atcc.org/Products/All/', # + .aspx
             'BioSample' : 'https://www.ncbi.nlm.nih.gov/biosample/?term=',
             'CCLE' : 'https://portals.broadinstitute.org/ccle/page?cell_line=',
             'Cell_Model_Passport' : 'https://cellmodelpassports.sanger.ac.uk/passports/',
             'ChEMBL-Cells' : 'https://www.ebi.ac.uk/chembldb/cell/inspect/',
             'ChEMBL-Targets' : 'https://www.ebi.ac.uk/chembldb/target/inspect/',
             'Cosmic' : 'https://cancer.sanger.ac.uk/cosmic/sample/overview?id=',
             'Cosmic-CLP' : 'https://cancer.sanger.ac.uk/cell_lines/sample/overview?id=',
             'DepMap' : 'https://depmap.org/portal/cell_line/ACH-000830',
             'GDSC' : 'https://www.cancerrxgene.org/translation/CellLine/',
             'GEO' : 'https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=',
             'IARC_TP53' : 'https://p53.iarc.fr/CellLines.aspx',
             'IGRhCellID' : 'http://igrcid.ibms.sinica.edu.tw/cgi-bin/cell_line_view.cgi?cl_name=',
             'KCLB' : 'https://cellbank.snu.ac.kr/english/sub/catalog.php?page=detail&CatNo=59&strQ=', # + &submit1=Find+it
             'LiGeA' : 'http://hpc-bioinformatics.cineca.it/fusion/cell_line/',
             'LINCS_LDP' : 'http://lincsportal.ccs.miami.edu/cells/#/view/',
             'PharmacoDB' : 'https://pharmacodb.ca/cell_lines/',
             'PRIDE' : 'https://www.ebi.ac.uk/pride/archive/projects/',
             'Wikidata' : 'https://www.wikidata.org/wiki/',
             'test' : 'test',
             'test2' : 'test2'
             }

cross_ref_dict = {'CLO': 'CLO_0007996', 'EFO': 'EFO_0002253', 'ArrayExpress': 'E-MTAB-3610', 'ATCC': 'CRL-5871', 'BioSample': 'SAMN10988000', 'CCLE': 'NCIH1436_LUNG', 'Cell_Model_Passport': 'SIDM00697', 'ChEMBL-Cells': 'CHEMBL3308893', 'ChEMBL-Targets': 'CHEMBL2366205', 'Cosmic': '2125229', 'Cosmic-CLP': '908469', 'DepMap': 'ACH-000830', 'GDSC': '908469', 'GEO': 'GSM1682805', 'IARC_TP53': '21539', 'IGRhCellID': 'NCIH1436', 'KCLB': '91436', 'LiGeA': 'CCLE_790', 'LINCS_LDP': 'LCL-1838', 'PharmacoDB': 'NCIH1436_1017_2019', 'PRIDE': 'PXD011896', 'Wikidata': 'Q54907807'}

# if the keys in dicts match, concatenate the base link from `database_dict` with the appropriate 
  value from `cross_ref_dict`
key_values['Cross-ref'] = [(str(database_dict.get(k, 0))+str(cross_ref_dict.get(k, 0))) for k 
                          in set(database_dict.keys()) | set(cross_ref_dict.keys()) if 
                          database_dict.keys() == cross_ref_dict.keys()]

2 个答案:

答案 0 :(得分:1)

您可以找到keys的交集,然后简单地对其进行迭代。

类似:

set(cross_ref_dict.keys()).intersection(set(database_dict.keys()))

这将仅保留两个词典共有的键,而不管每个词典的大小。然后,您可以简单地迭代相交的键,而无需担心。

使用您的逻辑:

[database_dict.get(k) + cross_ref_dict.get(k) for k in set(cross_ref_dict.keys()).intersection(set(database_dict.keys()))]



# ['https://www.ncbi.nlm.nih.gov/biosample/?term=SAMN10988000', 'https://www.ebi.ac.uk/ols/ontologies/clo/terms?iri=http://purl.obolibrary.org/obo/CLO_0007996', 'https://www.ebi.ac.uk/chembldb/target/inspect/CHEMBL2366205', 'https://cancer.sanger.ac.uk/cell_lines/sample/overview?id=908469', 'https://www.ebi.ac.uk/pride/archive/projects/PXD011896', 'https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-3610', 'https://cancer.sanger.ac.uk/cosmic/sample/overview?id=2125229', 'http://lincsportal.ccs.miami.edu/cells/#/view/LCL-1838', 'https://www.wikidata.org/wiki/Q54907807', 'https://depmap.org/portal/cell_line/ACH-000830ACH-000830', 'https://www.atcc.org/Products/All/CRL-5871', 'http://hpc-bioinformatics.cineca.it/fusion/cell_line/CCLE_790', 'https://p53.iarc.fr/CellLines.aspx21539', 'http://igrcid.ibms.sinica.edu.tw/cgi-bin/cell_line_view.cgi?cl_name=NCIH1436', 'https://cellmodelpassports.sanger.ac.uk/passports/SIDM00697', 'https://www.cancerrxgene.org/translation/CellLine/908469', 'https://cellbank.snu.ac.kr/english/sub/catalog.php?page=detail&CatNo=59&strQ=91436', 'https://portals.broadinstitute.org/ccle/page?cell_line=NCIH1436_LUNG', 'https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1682805', 'https://www.ebi.ac.uk/chembldb/cell/inspect/CHEMBL3308893', 'https://pharmacodb.ca/cell_lines/NCIH1436_1017_2019', ' https://www.ebi.ac.uk/efo/EFO_0002253']

答案 1 :(得分:1)

使用字典理解

results={key:data for (key,data) in database_dict.items() if key in cross_ref_dict.keys()}
print(results)