我有一个dataframe
和2个单独的dictionaries
。两个字典具有相同的键,但具有不同的值。 dict_1
具有键值对,其中的值是与数据帧df
相对应的唯一ID。我希望能够使用2个字典和dict_1
中的唯一ID将dict_2
的值附加到数据帧df
中。
数据帧df
的示例:
col_1 col_2 id col_3
100 500 a1 478
785 400 a1 490
... ... a1 ...
... ... a2 ...
... ... a2 ...
... ... a2 ...
... ... a3 ...
... ... a3 ...
... ... a3 ...
... ... a4 ...
... ... a4 ...
... ... a4 ...
dict_1
的示例:
1:['a1', 'a3'],
2:['a2', 'a4'],
3:[...],
4:[...],
5:[...],
.
dict_2
的示例:
1:[0, 1],
2:[1, 1],
3:[...],
4:[...],
5:[...],
.
我正在尝试使用dict_2
中的id将dict_1
中的数据附加到主df
中。
在某种意义上,将dict_2
列表中的2个值(或n个值)作为2列(或n列)添加到df
中。
结果df
:
col_1 col_2 id col_3 new_col_1 new_col_2
100 500 a1 478 0 1
785 400 a1 490 0 1
... ... a1 ... 0 1
... ... a2 ... 1 1
... ... a2 ... 1 1
... ... a2 ... 1 1
... ... a3 ... 0 1
... ... a3 ... 0 1
... ... a3 ... 0 1
... ... a4 ... 1 1
... ... a4 ... 1 1
... ... a4 ... 1 1
答案 0 :(得分:6)
IIUC,两个字典中的键是对齐的。一种方法是创建一个数据框,该数据框的列ID包含dict_1
中的值并对齐在同一键上的dict_2
和2(在这种情况下,但可以更多)列。然后在ID上使用merge
将结果返回到df
# the two dictionaries. note in dict_2 I added an element for the list in key 2
# to show it works for any number of columns
dict_1 = {1:['a1', 'a3'],2:['a2', 'a4'],}
dict_2 = {1:[0,1],2:[1,1,2]}
#create a dataframe from dict_2, here it might be something easier but can't find it
df_2 = pd.concat([pd.Series(vals, name=key)
for key, vals in dict_2.items()], axis=1).T
print(df_2) #index are the keys, and columns are the future new_col_x
0 1 2
1 0.0 1.0 NaN
2 1.0 1.0 2.0
#concat with the dict_1 once explode the values in the list,
# here just a print to see what it's doing
print (pd.concat([pd.Series(dict_1, name='id').explode(),df_2], axis=1))
id 0 1 2
1 a1 0.0 1.0 NaN
1 a3 0.0 1.0 NaN
2 a2 1.0 1.0 2.0
2 a4 1.0 1.0 2.0
# use previous concat, with a rename to change column names and merge to df
df = df.merge(pd.concat([pd.Series(dict_1, name='id').explode(),df_2], axis=1)
.rename(columns=lambda x: f'new_col_{x+1}'
if isinstance(x, int) else x),
on='id', how='left')
你会得到
print (df)
col_1 col_2 id col_3 new_col_1 new_col_2 new_col_3
0 100 500 a1 478 0.0 1.0 NaN
1 785 400 a1 490 0.0 1.0 NaN
2 ... ... a1 ... 0.0 1.0 NaN
3 ... ... a2 ... 1.0 1.0 2.0
4 ... ... a2 ... 1.0 1.0 2.0
5 ... ... a2 ... 1.0 1.0 2.0
6 ... ... a3 ... 0.0 1.0 NaN
7 ... ... a3 ... 0.0 1.0 NaN
8 ... ... a3 ... 0.0 1.0 NaN
9 ... ... a4 ... 1.0 1.0 2.0
10 ... ... a4 ... 1.0 1.0 2.0
11 ... ... a4 ... 1.0 1.0 2.0
答案 1 :(得分:4)
让我们尝试explode
与map
s=pd.Series(dict_1).explode().reset_index()
s.columns=[1,2]
df['new_1']=df.id.map(dict(zip(s[2],s[1])))
#s=pd.Series(dict_2).explode().reset_index()
#s.columns=[1,2]
#df['new_2']=df.id.map(dict(zip(s[2],s[1])))
答案 2 :(得分:3)
假设您有dict_2列表中的'n个值,并想在df
'中构造n个新列,例如
dict_2 = {1: [0, 1], 2: [1, 1, 6, 9]}
使用dict理解从dict_2
和dict_1
构造一个新的字典,并用它与orient='index'
构造一个新的数据帧。链接rename
和add_prefix
。最后,使用选项df
将其合并回left_on='id', right_index=True
key_dict = {x: v for k, v in dict_2.items() for x in dict_1[k]}
df_add = (pd.DataFrame.from_dict(key_dict, orient='index')
.rename(lambda x: int(x)+1, axis=1).add_prefix('newcol_'))
df_final = df.merge(df_add, left_on='id', right_index=True)
Out[33]:
col_1 col_2 id col_3 newcol_1 newcol_2 newcol_3 newcol_4
0 100 500 a1 478 0 1 NaN NaN
1 785 400 a1 490 0 1 NaN NaN
2 ... ... a1 ... 0 1 NaN NaN
3 ... ... a2 ... 1 1 6.0 9.0
4 ... ... a2 ... 1 1 6.0 9.0
5 ... ... a2 ... 1 1 6.0 9.0
6 ... ... a3 ... 0 1 NaN NaN
7 ... ... a3 ... 0 1 NaN NaN
8 ... ... a3 ... 0 1 NaN NaN
9 ... ... a4 ... 1 1 6.0 9.0
10 ... ... a4 ... 1 1 6.0 9.0
11 ... ... a4 ... 1 1 6.0 9.0
答案 3 :(得分:2)
构造一个DataFrame,将沿键的两个字典组合在一起。使用DataFrame.from_dict
构造函数,熊猫将处理键的对齐方式。
然后使用wide_to_long
对其进行整形,以使'id'
中的每个dict_1
与dict_2
中的所有列链接。然后,这是一个简单的合并,可以重新加入到原始文件中。
dict_1 = {1: ['a1', 'a3'], 2: ['a2', 'a4']}
dict_2 = {1: [0, 1], 2: [1, 1, 2]}
df1 = pd.concat([pd.DataFrame.from_dict(dict_1, orient='index').add_prefix('id'),
pd.DataFrame.from_dict(dict_2, orient='index').add_prefix('new_col')], axis=1)
# id0 id1 new_col0 new_col1 new_col2
#1 a1 a3 0 1 NaN
#2 a2 a4 1 1 2.0
df1 = (pd.wide_to_long(df1, i=[x for x in df1.columns if 'new_col' in x],
j='will_drop', stubnames=['id'])
.reset_index().drop(columns='will_drop'))
# new_col0 new_col1 new_col2 id
#0 0 1 NaN a1
#1 0 1 NaN a3
#2 1 1 2.0 a2
#3 1 1 2.0 a4
df = df.merge(df1, how='left')
col_1 col_2 id col_3 new_col0 new_col1 new_col2
0 100 500 a1 478 0 1 NaN
1 785 400 a1 490 0 1 NaN
2 ... ... a1 ... 0 1 NaN
3 ... ... a2 ... 1 1 2.0
4 ... ... a2 ... 1 1 2.0
5 ... ... a2 ... 1 1 2.0
6 ... ... a3 ... 0 1 NaN
7 ... ... a3 ... 0 1 NaN
8 ... ... a3 ... 0 1 NaN
9 ... ... a4 ... 1 1 2.0
10 ... ... a4 ... 1 1 2.0
11 ... ... a4 ... 1 1 2.0