Question

我有一个数据框，如果另一个数据框中存在匹配值，我不想添加具有某种状态的列。我有当前有效的代码：

df1['NewColumn'] = df1['ComparisonColumn'].apply(lambda x: 'Match' if any(df2.ComparisonColumn == x) else ('' if x is None else 'Missing'))

我知道这条线是丑陋的，但我觉得效率低下。你能建议一个更好的方法进行比较吗？

Answer 1

您可以使用np.where，isin和isnull：

创建一些虚拟数据：

np.random.seed(123)
df = pd.DataFrame({'ComparisonColumn':np.random.randint(10,20,20)})
df.iloc[4] = np.nan #Create missing data
df2 = pd.DataFrame({'ComparisonColumn':np.random.randint(15,30,20)})

与np.where匹配：

df['NewColumn']  = np.where(df.ComparisonColumn.isin(df2.ComparisonColumn),'Matched',np.where(df.ComparisonColumn.isnull(),'Missing',''))

输出：

    ComparisonColumn NewColumn
0               12.0          
1               12.0          
2               16.0   Matched
3               11.0          
4                NaN   Missing
5               19.0   Matched
6               16.0   Matched
7               11.0          
8               10.0          
9               11.0          
10              19.0   Matched
11              10.0          
12              10.0          
13              19.0   Matched
14              13.0          
15              14.0          
16              10.0          
17              10.0          
18              14.0          
19              11.0

Pandas DataFrame应用效率

1 个答案: