Question

在将多个条件应用于数据框时，我曾经使用过np.where函数，并感觉可以使用它。我想在np.where中的每组条件中重复相同条件的情况下改进我的代码，并且我不知道如何使用（1）以最简单（清晰明了的方式）进行操作。 loc 或（2） IF “条件” DO “应用其他条件”

示例：

我只需要选择“日期”处于条件（例如> 20200201）的行，并且仅针对这些行，应用另一组不同的条件（例如条件1：A> 20和B>），计算新列20；条件2：A == 30和B == 10，条件3：A == 20和B> = 10等）

我的问题是什么是最好的方法，以便进行第一选择（数据> 20200202），而不要在每行中都重复Date> 2020201并避免这种情况：

import pandas as pd
import numpy as np

df = pd.DataFrame({"ID": [1,3,2,2,3,1,3,2],
           "Date": [20200109, 20200204, 20200307, 20200216, 20200107, 20200108, 20200214, 20200314],
           "A": [20,10,40,40,10,20, 40,30], 
           "B": [20,30,40,50,20, 30, 20, 10]})

df['new']=np.nan
df['new']=np.where((df['Date']>20200201) & (df['A']>20) & (df['B']>20), 'value', df['new'])
df['new']=np.where((df['Date']>20200201) & (df['A']==30) & (df['B']==10), 'value', df['new'])
df['new']=np.where((df['Date']>20200201) & (df['A']==20) & (df['B']>=10), 'value', df['new'])

Answer 1

看起来您可以使用np.select：

s1 = df.Date <= 20200201
s2 = (df['A'] > 20) & df['B'].gt(20)
s3 = df['A'].eq(30) & df['B'].eq(10)
s4 = df['A'].eq(20) & df['B'].ge(10)

df['new'] = np.select( (s1,s2|s3|s4), (np.nan, 'value'), np.nan)

输出：

   ID      Date   A   B    new
0   1  20200109  20  20    nan
1   3  20200204  10  30    nan
2   2  20200307  40  40  value
3   2  20200216  40  50  value
4   3  20200107  10  20    nan
5   1  20200108  20  30    nan
6   3  20200214  40  20    nan
7   2  20200314  30  10  value

Answer 2

它可能不是最快的解决方案，但是它的优点是（将来）可读性和易于维护。

使用 query 和这些行的索引查找有问题的行：

ind = df.query('Date > 20200201 and (A > 20 and B > 20 or '
    'A == 30 and B == 10 or A == 20 and B >= 10)').index

在指示的行中的新列中保存新值：
```
df.loc[ind, 'new'] = 'value'; df
```

此列中的其他值仍为 NaN 。

如果将来在上述情况下发生某些变化，这很容易直观地纠正它。

因此，除非您的数据量很大并且执行时间为太长了，这个解决方案值得考虑。

将一系列条件应用于数据框。熊猫

2 个答案: