我看到了这个问题的各种版本,但是似乎都不符合我的尝试:这是我的数据:
以下是带有NaN
的df:
df = pd.DataFrame({"A": ["10023", "10040", np.nan, "12345", np.nan, np.nan, "10033", np.nan, np.nan],
"B": [",", "17,-6", "19,-2", "17,-5", "37,-5", ",", "9,-10", "19,-2", "2,-5"],
"C": ["small", "large", "large", "small", "small", "large", "small", "small", "large"]})
A B C
0 10023 , small
1 10040 17,-6 large
2 NaN 19,-2 large
3 12345 17,-5 small
4 NaN 37,-5 small
5 NaN , large
6 10033 9,-10 small
7 NaN 19,-2 small
8 NaN 2,-5 large
接下来,我有一个名为df2
的查找df:
df2 = pd.DataFrame({"B": ['17,-5', '19,-2', '37,-5', '9,-10'],
"A": ["10040", "54321", "12345", "10033"]})
B A
0 17,-5 10040
1 19,-2 54321
2 37,-5 12345
3 9,-10 10033
我想通过查找列NaN
并返回A
来填充df
的{{1}}列df2.B
的{{1}}中,使结果{ {1}}如下所示:
df2.A
重要警告:
dfr
没有匹配的索引 A B C
0 10023 , small
1 10040 17,-6 large
2 54321 19,-2 large
3 10040 17,-5 small
4 12345 37,-5 small
5 NaN , large
6 10033 9,-10 small
7 54321 19,-2 small
8 NaN 2,-5 large
和df
的内容不是唯一的()df.A
的行确实构成唯一对。df2.A
s的未显示列。使用熊猫,df2
上感兴趣的行将通过NaN
找到(我认为)。 This的答案似乎很有希望,但我不清楚该示例中df
的来源。我的实际数据集远不止于此,我将不得不以这种方式替换几列。
答案 0 :(得分:1)
只需使用np.where
df.A=np.where(df.A.isnull(),df.B.map(df2.set_index('B').A),df.A)
df
Out[149]:
A B C
0 10023 , small
1 10040 17,-6 large
2 54321 19,-2 large
3 12345 17,-5 small
4 12345 37,-5 small
5 NaN , large
6 10033 9,-10 small
7 54321 19,-2 small
8 NaN 2,-5 large
答案 1 :(得分:1)
Wen-Ben的map
方法在速度方面会更快,但这是您解决问题的另一种方法,只是为了您的方便和知识
您可以使用pd.merge
,因为这基本上是一个join
问题。
合并后,我们填充并删除不需要的列。
df_final = pd.merge(df, df2, on='B', how='left', suffixes=['_1','_2'])
df_final['A'] = df_final.A_1.fillna(df_final.A_2)
df_final.drop(['A_1', 'A_2'], axis=1, inplace=True)
print(df_final)
B C A
0 , small 10023
1 17,-6 large 10040
2 19,-2 large 54321
3 17,-5 small 12345
4 37,-5 small 12345
5 , large NaN
6 9,-10 small 10033
7 19,-2 small 54321
8 2,-5 large NaN