我有两个数据帧df1和df2。考虑到唯一标识符(id
),我想根据df1中的相应条目填充df2中的空值。下面是代码:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({"id": [3,4,5,6,7,8,9],
"col1": ['mike', 'matt', 'mertha', 'peter', 'tabby', 'carl', 'brian'],
"col2": ['645-345', '645-333', '324-543', '123-432', '563-654', '324-123', '902-342'],
"col3": ['cat', 'cat','dog', 'none', 't-rex', 'goat', 'snake']})
df2 = pd.DataFrame({"id": [6, 6, 7, 7, 7, 8, 8, 9],
"col1": ['peter', 'peter', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
"col2": ['324-123','324-123', '902-342', '902-332', '902-123', '556-786', '113-786', '901-345'],
"col3": ['none', 'none', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]})
为简便起见,当我尝试在此网站上进行所有阳光照射时,我并不是在开玩笑,而且我似乎找不到答案。任何帮助将不胜感激!
我只想填写col1
和col3
np.nan
值。 None
只是另一个选择。我的预期输出如下:
df_merged = pd.DataFrame({"id": [6, 6, 7, 7, 7, 8, 8, 9],
"col1": ['peter', 'peter', 'tabby','tabby', 'tabby', 'carl','carl','brian'],
"col2": ['324-123','324-123', '902-342', '902-332', '902-123', '556-786', '113-786', '901-345'],
"col3": ['none', 'none', 't-rex', 't-rex', 't-rex', 'goat', 'goat', 'snake']})
答案 0 :(得分:1)
如果id
在两个数据框中都是索引,则Erfan的注释应该起作用。否则:
(df2.set_index('id')
.fillna(df1.set_index('id'))
.reset_index()
)
输出:
id col1 col2 col3
0 6 peter 324-123 none
1 6 peter 324-123 none
2 7 tabby 902-342 t-rex
3 7 tabby 902-332 t-rex
4 7 tabby 902-123 t-rex
5 8 carl 556-786 goat
6 8 carl 113-786 goat
7 9 brian 901-345 snake