如何在for循环中将字典追加到字典?

时间:2020-08-11 14:40:13

标签: python dictionary for-loop

我正在尝试创建一个字典,其中每个键的值是两个字典。

我有两个患者(正常组织,疾病组织)条形码列表,它们对应于数据框中的值列。我的目标是匹配两个列表中的患者,然后针对两个列表中的每个患者,将其正常值和疾病组织值附加到字典中。字典键将是患者条形码,而字典值将是正常组织的另一个字典:从数据框中提取的值,而疾病组织:从数据框中提取的值。

所以从

开始
In [3]: df = pd.DataFrame({'Patient1_Normal':['nan', 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, 'nan'],
                 'Patient1_Disease':[0.12, 0.06, 0.19, 0.34, 'nan', 'nan', 0.73, 0.91],
                 'Patient2_Disease':['nan', 'nan', 'nan', 1.0, 0.24, 0.67, 0.97, 0.98],
                 'Patient3_Normal': [0.21, 0.25,0.63,0.92,0.3, 0.56, 0.78, 0.9],
                 'Patient3_Disease':[0.11, 0.45, 'nan', 0.45, 0.22, 0.89, 0.17, 0.12],
                 'Patient4_Normal':['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91],
                 'Patient4_Disease':['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
                 'Patient5_Disease': [0.34, 0.27, 'nan', 0.16, 0.32, 0.27, 0.55, 0.51]})


In [4]: df                                                                                                                                 
Out[4]: Patient1_Normal Patient1_Disease Patient2_Disease  Patient3_Normal Patient3_Disease Patient4_Normal Patient4_Disease Patient5_Disease
    0             nan             0.12              nan             0.21             0.11             nan              nan             0.34
    1            0.01             0.06              nan             0.25             0.45            0.35              nan             0.27
    2             0.1             0.19              nan             0.63              nan             nan             0.56              nan
    3            0.16             0.34                1             0.92             0.45            0.22             0.72             0.16
    4            0.88              nan             0.24             0.30             0.22            0.45              nan             0.32
    5            0.83              nan             0.67             0.56             0.89            0.66             0.97             0.27
    6            0.82             0.73             0.97             0.78             0.17            0.21             0.91             0.55
    7             nan             0.91             0.98             0.90             0.12            0.91             0.79             0.51

这是我到目前为止所拥有的:

D_col = [col for col in df if '_Disease' in col]
N_col = [col for col in df if '_Normal' in col]

paired_patients = {}
psi_sets = {}
psi_sets['d'] = []
psi_sets['n'] = []

for patient in N_col:
       patient_id = patient[0:8]

       n_id = patient
       d_id = [i for i in D_col if patient_id in i]

       if len(d_id) > 0:
           psi_sets['n'] = df[n_id].to_list()
           for d in d_id:
               psi_sets['d'] = df[d].to_list()

       paired_patients[patient_id] = psi_sets

但是,我的paired_patients字典值是覆盖而不是附加,因此paired_patients的输出看起来像这样:

{'Patient1': {'d': ['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]},
 'Patient3': {'d': ['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
  'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]},
 'Patient4': {'d': ['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
  'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]}}

我该如何修正代码的最后一位,以便为每个患者正确附加paired_patient字典值,以使paired_patient字典看起来像这样:

{'Patient1': {'d': [0.12, 0.06, 0.19, 0.34, 'nan', 'nan', 0.73, 0.91],
  'n': ['nan', 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, 'nan']},
 'Patient3': {'d': [0.11, 0.45, 'nan', 0.45, 0.22, 0.89, 0.17, 0.12],
  'n': [0.21, 0.25,0.63,0.92,0.3, 0.56, 0.78, 0.9]},
 'Patient4': {'nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
  'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]}}

2 个答案:

答案 0 :(得分:1)

D_col = [col for col in df if '_Disease' in col]
N_col = [col for col in df if '_Normal' in col]
paired_patients = {}


for patient in N_col:
    psi_sets = {}
    patient_id = patient[0:8]
    n_id = patient
    d_id = [i for i in D_col if patient_id in i]

    if len(d_id) > 0:
        psi_sets['n'] = df[n_id].to_list()
        for d in d_id:
            psi_sets['d'] = df[d].to_list()
 
    paired_patients[patient_id] = psi_sets

答案 1 :(得分:0)

您可以使用df.meltpd.concatseries.str.splitdf.replacedf.groupbydf.xs,最后使用df.to_dict。 请检查以下内容:

>>> df2 = (pd.concat([
                      df.melt().variable.str.split('_', expand=True),
                      df.melt().drop('variable',1)
                    ], axis=1)
                       .replace({'Normal':'n', 'Disease':'d'})
                       .groupby([0,1]).agg(list))
>>> paired_patients = {k: v for k, v in
                       df2.groupby(level=0)
                          .apply(lambda df: df.xs(df.name).value.to_dict())
                          .to_dict().items()
                       if not ({'d', 'n'} ^ v.keys())}
>>> paired_patients
{'Patient1': {'d': [0.12, 0.06, 0.19, 0.34, 'nan', 'nan', 0.73, 0.91],
  'n': ['nan', 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, 'nan']},
 'Patient3': {'d': [0.11, 0.45, 'nan', 0.45, 0.22, 0.89, 0.17, 0.12],
  'n': [0.21, 0.25,0.63,0.92,0.3, 0.56, 0.78, 0.9]},
 'Patient4': {'nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
  'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]}}

EXPLANTION

>>> df.melt()
            variable  value
0    Patient1_Normal    NaN
1    Patient1_Normal   0.01
2    Patient1_Normal   0.10
..               ...    ...
62  Patient5_Disease   0.55
63  Patient5_Disease   0.51

>>> df.melt().variable.str.split('_', expand=True)
 
           0        1
0   Patient1   Normal
1   Patient1   Normal
2   Patient1   Normal
..       ...      ...
62  Patient5  Disease
63  Patient5  Disease

[64 rows x 2 columns]

# then concat these two, replace 'Normal' and 'Disease' with 'n' and 'd' and drop
# the 'variable' column
>>> pd.concat([
                      df.melt().variable.str.split('_', expand=True),
                      df.melt().drop('variable',1)
                    ], axis=1).replace({'Normal':'n', 'Disease':'d'})
           0  1  value
0   Patient1  n    NaN
1   Patient1  n   0.01
2   Patient1  n   0.10
..       ... ..    ...
62  Patient5  d   0.55
63  Patient5  d   0.51

[64 rows x 3 columns]

# then groupby column [0, 1] and aggregate into list:
>>> df2 = _.groupby([0,1]).agg(list)
>>> df2
                                                      value
0        1                                                 
Patient1 d   [0.12, 0.06, 0.19, 0.34, nan, nan, 0.73, 0.91]
         n    [nan, 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, nan]
Patient2 d     [nan, nan, nan, 1.0, 0.24, 0.67, 0.97, 0.98]
Patient3 d  [0.11, 0.45, nan, 0.45, 0.22, 0.89, 0.17, 0.12]
         n   [0.21, 0.25, 0.63, 0.92, 0.3, 0.56, 0.78, 0.9]
Patient4 d    [nan, nan, 0.56, 0.72, nan, 0.97, 0.91, 0.79]
         n   [nan, 0.35, nan, 0.22, 0.45, 0.66, 0.21, 0.91]
Patient5 d  [0.34, 0.27, nan, 0.16, 0.32, 0.27, 0.55, 0.51]

# Now groupby level=0, and convert that into dict, and finally check whether 
# both 'n' and 'd' are present as keys by using symmetric set difference
# properties of dict_keys objects

>>> paired_patients = {k: v for k, v in
                       df2.groupby(level=0)
                          .apply(lambda df: df.xs(df.name).value.to_dict())
                          .to_dict().items()
                       if ('n' in v) and ('d' in v)}