使用字典数据将数据附加到pandas数据框

时间:2020-05-26 16:09:34

标签: python pandas dictionary

我有一个dataframe和2个单独的dictionaries。两个字典具有相同的键,但具有不同的值。 dict_1具有键值对,其中的值是与数据帧df相对应的唯一ID。我希望能够使用2个字典和dict_1中的唯一ID将dict_2的值附加到数据帧df中。

数据帧df的示例:

col_1    col_2    id   col_3
 100      500     a1    478
 785      400     a1    490
 ...      ...     a1    ...
 ...      ...     a2    ...
 ...      ...     a2    ...
 ...      ...     a2    ...
 ...      ...     a3    ...
 ...      ...     a3    ...
 ...      ...     a3    ...
 ...      ...     a4    ...
 ...      ...     a4    ...
 ...      ...     a4    ...

dict_1的示例:

1:['a1', 'a3'],
2:['a2', 'a4'],
3:[...],
4:[...],
5:[...],
.

dict_2的示例:

1:[0, 1],
2:[1, 1],
3:[...],
4:[...],
5:[...],
.

我正在尝试使用dict_2中的id将dict_1中的数据附加到主df中。 在某种意义上,将dict_2列表中的2个值(或n个值)作为2列(或n列)添加到df中。

结果df

col_1    col_2    id   col_3   new_col_1   new_col_2 
 100      500     a1    478        0           1
 785      400     a1    490        0           1
 ...      ...     a1    ...        0           1
 ...      ...     a2    ...        1           1
 ...      ...     a2    ...        1           1
 ...      ...     a2    ...        1           1
 ...      ...     a3    ...        0           1
 ...      ...     a3    ...        0           1
 ...      ...     a3    ...        0           1
 ...      ...     a4    ...        1           1
 ...      ...     a4    ...        1           1
 ...      ...     a4    ...        1           1

4 个答案:

答案 0 :(得分:6)

IIUC,两个字典中的键是对齐的。一种方法是创建一个数据框,该数据框的列ID包含dict_1中的值并对齐在同一键上的dict_2和2(在这种情况下,但可以更多)列。然后在ID上使用merge将结果返回到df

# the two dictionaries. note in dict_2 I added an element for the list in key 2
# to show it works for any number of columns
dict_1 = {1:['a1', 'a3'],2:['a2', 'a4'],}
dict_2 = {1:[0,1],2:[1,1,2]} 

#create a dataframe from dict_2, here it might be something easier but can't find it
df_2 = pd.concat([pd.Series(vals, name=key) 
                  for key, vals in dict_2.items()], axis=1).T
print(df_2) #index are the keys, and columns are the future new_col_x
     0    1    2
1  0.0  1.0  NaN
2  1.0  1.0  2.0

#concat with the dict_1 once explode the values in the list, 
# here just a print to see what it's doing
print (pd.concat([pd.Series(dict_1, name='id').explode(),df_2], axis=1))
   id    0    1    2
1  a1  0.0  1.0  NaN
1  a3  0.0  1.0  NaN
2  a2  1.0  1.0  2.0
2  a4  1.0  1.0  2.0

# use previous concat, with a rename to change column names and merge to df
df = df.merge(pd.concat([pd.Series(dict_1, name='id').explode(),df_2], axis=1)
                .rename(columns=lambda x: f'new_col_{x+1}' 
                                          if isinstance(x, int) else x), 
              on='id', how='left')

你会得到

print (df)
   col_1 col_2  id col_3  new_col_1  new_col_2  new_col_3
0    100   500  a1   478        0.0        1.0        NaN
1    785   400  a1   490        0.0        1.0        NaN
2    ...   ...  a1   ...        0.0        1.0        NaN
3    ...   ...  a2   ...        1.0        1.0        2.0
4    ...   ...  a2   ...        1.0        1.0        2.0
5    ...   ...  a2   ...        1.0        1.0        2.0
6    ...   ...  a3   ...        0.0        1.0        NaN
7    ...   ...  a3   ...        0.0        1.0        NaN
8    ...   ...  a3   ...        0.0        1.0        NaN
9    ...   ...  a4   ...        1.0        1.0        2.0
10   ...   ...  a4   ...        1.0        1.0        2.0
11   ...   ...  a4   ...        1.0        1.0        2.0

答案 1 :(得分:4)

让我们尝试explodemap

s=pd.Series(dict_1).explode().reset_index()
s.columns=[1,2]
df['new_1']=df.id.map(dict(zip(s[2],s[1])))

#s=pd.Series(dict_2).explode().reset_index()
#s.columns=[1,2]
#df['new_2']=df.id.map(dict(zip(s[2],s[1])))

答案 2 :(得分:3)

假设您有dict_2列表中的'n个值,并想在df'中构造n个新列,例如

dict_2 = {1: [0, 1], 2: [1, 1, 6, 9]}

使用dict理解从dict_2dict_1构造一个新的字典,并用它与orient='index'构造一个新的数据帧。链接renameadd_prefix。最后,使用选项df将其合并回left_on='id', right_index=True

key_dict = {x: v for k, v in dict_2.items() for x in dict_1[k]}

df_add = (pd.DataFrame.from_dict(key_dict, orient='index')
                      .rename(lambda x: int(x)+1, axis=1).add_prefix('newcol_'))
    
df_final = df.merge(df_add, left_on='id', right_index=True)

Out[33]:
   col_1 col_2  id col_3  newcol_1  newcol_2  newcol_3  newcol_4
0    100   500  a1   478         0         1       NaN       NaN
1    785   400  a1   490         0         1       NaN       NaN
2    ...   ...  a1   ...         0         1       NaN       NaN
3    ...   ...  a2   ...         1         1       6.0       9.0
4    ...   ...  a2   ...         1         1       6.0       9.0
5    ...   ...  a2   ...         1         1       6.0       9.0
6    ...   ...  a3   ...         0         1       NaN       NaN
7    ...   ...  a3   ...         0         1       NaN       NaN
8    ...   ...  a3   ...         0         1       NaN       NaN
9    ...   ...  a4   ...         1         1       6.0       9.0
10   ...   ...  a4   ...         1         1       6.0       9.0
11   ...   ...  a4   ...         1         1       6.0       9.0

答案 3 :(得分:2)

构造一个DataFrame,将沿键的两个字典组合在一起。使用DataFrame.from_dict构造函数,熊猫将处理键的对齐方式。

然后使用wide_to_long对其进行整形,以使'id'中的每个dict_1dict_2中的所有列链接。然后,这是一个简单的合并,可以重新加入到原始文件中。

样本数据

dict_1 = {1: ['a1', 'a3'], 2: ['a2', 'a4']}
dict_2 = {1: [0, 1], 2: [1, 1, 2]} 

代码

df1 = pd.concat([pd.DataFrame.from_dict(dict_1, orient='index').add_prefix('id'),
                 pd.DataFrame.from_dict(dict_2, orient='index').add_prefix('new_col')], axis=1)
#  id0 id1  new_col0  new_col1  new_col2
#1  a1  a3         0         1       NaN
#2  a2  a4         1         1       2.0

df1 = (pd.wide_to_long(df1, i=[x for x in df1.columns if 'new_col' in x],
                       j='will_drop', stubnames=['id'])
         .reset_index().drop(columns='will_drop'))
#   new_col0  new_col1  new_col2  id
#0         0         1       NaN  a1
#1         0         1       NaN  a3
#2         1         1       2.0  a2
#3         1         1       2.0  a4

df = df.merge(df1, how='left')

   col_1 col_2  id col_3  new_col0  new_col1  new_col2
0    100   500  a1   478         0         1       NaN
1    785   400  a1   490         0         1       NaN
2    ...   ...  a1   ...         0         1       NaN
3    ...   ...  a2   ...         1         1       2.0
4    ...   ...  a2   ...         1         1       2.0
5    ...   ...  a2   ...         1         1       2.0
6    ...   ...  a3   ...         0         1       NaN
7    ...   ...  a3   ...         0         1       NaN
8    ...   ...  a3   ...         0         1       NaN
9    ...   ...  a4   ...         1         1       2.0
10   ...   ...  a4   ...         1         1       2.0
11   ...   ...  a4   ...         1         1       2.0