Pandas - 如果满足多个条件,则更新列

时间:2021-04-30 07:07:48

标签: python pandas dataframe

我的目标是下面的输出。

<头>
A B C D E F
0000 ZZZ 987 QW1 8 前三四列和偏移
0000 ZZZ 987 QW1 -8 前三四列和偏移
1111 AAA 123 AB1 1 前三四列和偏移
1111 AAA 123 CD1 -1 前三四列和偏移
2222 BBB 456 EF1 -4 前三四列和偏移
2222 BBB 456 GH1 -1 前三四列和偏移
2222 BBB 456 IL1 5 前三四列和偏移
3333 CCC 789 MN1 2 前两个列和偏移量
3333 CCC 101 MN1 -2 前两个列和偏移量
4444 DDD 121 UYT 6 前两个列和偏移量
4444 DDD 131 FB1 -5 前两个列和偏移量
4444 DDD 141 UYT -1 前两个列和偏移量
5555 EE 151 CB1 3 前两个列和偏移量
5555 EE 161 CR1 -3 前两个列和偏移量
6666 FFF 111 CB1 4 首次匹配或不匹配
7777 GGG 222 ZB1 10.5 前三四列和小偏移
7777 GGG 222 ZB1 -10 前三四列和小偏移

第一条规则)前三列必须彼此相等 - 无论第四列如何,可以相等也可以不相等。每个组合必须将关联的数字 (col E) 偏移为零(可以组合 2 到 X 条记录)。

第二条规则)前两列必须彼此相等 - 无论第四列如何,可以相等也可以不相等。每个组合必须将关联的数字 (col E) 偏移为零(可以组合 2 到 X 条记录)。

第三条规则)不匹配。

第四条规则)前三列必须彼此相等 - 无论第四列如何,可以相等也可以不相等。每个组合可以有 0.5 AT MOST (col E) 的差异,并且没有偏移为零(可以组合 2 到 X 条记录)。

请看下面我的代码。

我完全意识到我没有以最有效的方式编写代码。您能否建议一种更有效的方法来实现这一目标?

for i in range(0, len(df)-1):
    for j in range(i+1, len(df)):
        if (df['A'][i] == df['A'][j]) & (df['B'][i] == df['B'][j]) & (df['C'][i] == df['C'][j]) & (df['E'][i] + df['E'][j] == 0) :
            df['E'][i] = 'first three-four col and offset'
            df['E'][j] = 'first three-four col and offset'


for i in range(0, len(df)-1):
    for j in range(i+1, len(df)):
        if (df['A'][i] == df['A'][j]) & (df['B'][i] == df['B'][j]) & (df['E'][i] + df['E'][j] == 0) & (df['E'][i] != 'first three-four col and offset') & (df['E'][j] != 'first three-four col and offset'):
            df['E'][i] = 'first two col and offset'
            df['E'][j] = 'first two col and offset'


for i in range(0, len(df)-1):
    for j in range(i+1, len(df)):
        if (df['A'][i] == df['A'][j]) & (df['B'][i] == df['B'][j]) & (df['C'][i] == df['C'][j]) & (df['E'][i] + df['E'][j] != 0) & (df['E'][i] + df['E'][j] =< 0.5) & (df['E'][i] + df['E'][j] >= -0.5) & (df['E'][i] != 'first three-four col and offset') & (df['E'][j] != 'first three-four col and offset') & (df['E'][i] != 'first two col and offset') & (df['E'][j] != 'first two col and offset'):
            df['E'][i] = 'first three-four col and small offset'
            df['E'][j] = 'first three-four col and small offset'

有没有办法以更有效的方式获得预期的结果?

我也知道以下代码不起作用。我尝试用正确的评论更新这条记录,但徒劳无功。

for ... :
  if.... :
     df['col'][index] = 'comment'

让我们进一步假设我想以这种“效率不高的方式”保留我的代码,这似乎有效(除了最后一行代码)。我应该如何更改最后一行以使我的脚本正常工作?

1 个答案:

答案 0 :(得分:3)

groupby + transformnp.select

m1 = df.groupby(['A', 'B', 'C'])['E'].transform('sum').eq(0)  # Rule 1
m2 = df.groupby(['A', 'B'])['E'].transform('sum').eq(0)  # Rule 2
m3 = df.groupby(['A', 'B', 'C'])['E'].transform('sum').abs().le(0.5)  # Rule 4

df['new'] = np.select([m1, m2, m3], ['first three-four col and offset',
                      'first two col and offset', 'first three-four col and small offset'], 'first or no match')

       A    B    C    D     E                                      F                                    new
0   0000  ZZZ  987  QW1   8.0        first three-four col and offset        first three-four col and offset
1   0000  ZZZ  987  QW1  -8.0        first three-four col and offset        first three-four col and offset
2   1111  AAA  123  AB1   1.0        first three-four col and offset        first three-four col and offset
3   1111  AAA  123  CD1  -1.0        first three-four col and offset        first three-four col and offset
4   2222  BBB  456  EF1  -4.0        first three-four col and offset        first three-four col and offset
5   2222  BBB  456  GH1  -1.0        first three-four col and offset        first three-four col and offset
6   2222  BBB  456  IL1   5.0        first three-four col and offset        first three-four col and offset
7   3333  CCC  789  MN1   2.0               first two col and offset               first two col and offset
8   3333  CCC  101  MN1  -2.0               first two col and offset               first two col and offset
9   4444  DDD  121  UYT   6.0               first two col and offset               first two col and offset
10  4444  DDD  131  FB1  -5.0               first two col and offset               first two col and offset
11  4444  DDD  141  UYT  -1.0               first two col and offset               first two col and offset
12  5555  EEE  151  CB1   3.0               first two col and offset               first two col and offset
13  5555  EEE  161  CR1  -3.0               first two col and offset               first two col and offset
14  6666  FFF  111  CB1   4.0                      first or no match                      first or no match
15  7777  GGG  222  ZB1  10.5  first three-four col and small offset  first three-four col and small offset
16  7777  GGG  222  ZB1 -10.0  first three-four col and small offset  first three-four col and small offset
相关问题