当我们有空值时,将信息从字典映射到数据帧

时间:2017-05-09 17:29:12

标签: python pandas dictionary mapping

这是第一个数据框

Umls                                    Snomed
C0027497/Nausea /Sign or Symptom    Nausea (finding)[FN/422587007] 
C0151786 / Muscle/Sign or Symptom   Muscle weakness [(finding) /FN/26544005]
C2127305 /bitter/ Sign or Symptom    ?
NA                                   NA

我使用以下代码

创建了它的字典
df_dic_1= df_dic_1[['UMLS', 'snomed']]

df_dic_1['UMLS'].fillna(0, inplace=True)
df_dic_1['snomed'].fillna(0, inplace=True)

equiv_snomed=df_dic_1.set_index('UMLS')['snomed'].to_dict()

现在,对于数据框B:

id     symptom      UMLS                               
1      nausea    C0027497/Nausea /Sign or Symptom
2      muscle     C2127305 /bitter/ Sign or Symptom 
3      headache     
4      pain 
5      bitter     C2127305 /bitter/ Sign or Symptom 

对于字典中可用的“UMLS”列中的任何值,我想创建另一列“Snomed”,其中包含字典中的“snomed”值。所以数据框C应该是这样的:

  id     symptom      UMLS                                   Snomed                         
    1      nausea    C0027497/Nausea /Sign or Symptom    Nausea (finding)[FN/422] 
    2      muscle    C0151786 / Muscle/Sign or Symptom   Muscle [(fi)/FN/25]
    3      headache        
    4      pain 
    5      bitter     C2127305 /bitter/ Sign or Symptom   ?

有任何帮助吗?谢谢

2 个答案:

答案 0 :(得分:2)

见EdChum对this Stack Overflow question的回答。

适用于您的情况,它看起来像:

import pandas as pd

# create dictionary
d = {'umls1':'snomed1','umls2':'snomed2','umls3':'snomed3'}

# create empty dataframe
columns = ['symptom','umls','snomed']
df = pd.DataFrame(columns = columns)

# fill it with symptoms and with umls, with some umls NULL
df['symptom'] = ['nausea','muscle','headache','pain','bitter']
df.ix[0,'umls'] = 'umls1'
df.ix[1,'umls'] = 'umls2'
df.ix[4,'umls'] = 'umls3'

# add a third column with snomed values from dictionary
df['snomed'] = df['umls'].map(d)

提供以下输出:

df.head()
Out[21]: 
    symptom   umls   snomed
0    nausea  umls1  snomed1
1    muscle  umls2  snomed2
2  headache    NaN      NaN
3      pain    NaN      NaN
4    bitter  umls3  snomed3

答案 1 :(得分:1)

您可以对列UMLS的每个元素使用apply函数,并从字典equiv_snomed中获取值。如果字典中没有键,则可以返回np.nan

如果您的数据框B名为df2。然后

df2['Snomed'] = df2['UMLS'].apply(lambda x: equiv_snomed.get(x, np.nan))