Pandas DataFrame应用效率

时间:2017-05-30 13:30:25

标签: python-3.x pandas apply

我有一个数据框,如果另一个数据框中存在匹配值,我不想添加具有某种状态的列。我有当前有效的代码:

df1['NewColumn'] = df1['ComparisonColumn'].apply(lambda x: 'Match' if any(df2.ComparisonColumn == x) else ('' if x is None else 'Missing'))

我知道这条线是丑陋的,但我觉得效率低下。你能建议一个更好的方法进行比较吗?

1 个答案:

答案 0 :(得分:1)

您可以使用np.whereisinisnull

创建一些虚拟数据:

np.random.seed(123)
df = pd.DataFrame({'ComparisonColumn':np.random.randint(10,20,20)})
df.iloc[4] = np.nan #Create missing data
df2 = pd.DataFrame({'ComparisonColumn':np.random.randint(15,30,20)})

np.where匹配:

df['NewColumn']  = np.where(df.ComparisonColumn.isin(df2.ComparisonColumn),'Matched',np.where(df.ComparisonColumn.isnull(),'Missing',''))

输出:

    ComparisonColumn NewColumn
0               12.0          
1               12.0          
2               16.0   Matched
3               11.0          
4                NaN   Missing
5               19.0   Matched
6               16.0   Matched
7               11.0          
8               10.0          
9               11.0          
10              19.0   Matched
11              10.0          
12              10.0          
13              19.0   Matched
14              13.0          
15              14.0          
16              10.0          
17              10.0          
18              14.0          
19              11.0