根据另一列的多个条件修改一个列的值

时间:2019-02-11 06:57:09

标签: python pandas

如果我具有以下数据框。我想基于A列的多个条件返回B列的任意值,这是规则:如果A列中的值> = 0且<50,则返回B列的原始值;如果A列中的值> = 50并且<70,则返回B列的值除以3;如果A列中的值> = 70并且<100,则返回B列的返回值除以C列和3。

import pandas as pd
import numpy as np
np.random.seed(5)
df = pd.DataFrame(np.random.randint(100, size=(100, 3)), columns=list('ABC'))

我在Python中的伪代码:

def Standard():
    if (df['A'] >= 0) and (df['A'] < 50):
        return df['B'] 
    if (df['A'] >= 50) and (df['A'] < 70):
        return df['B']/3
    if (df['A'] >= 70) and (df['A'] <= 100):
        return df['B']/df['C']/3

df['B'] = df.apply(Standard, axis = 1)

它返回:TypeError: ('Standard() takes 0 positional arguments but 1 was given', 'occurred at index 0')

如何纠正我的代码,或者Python中还有其他更好的方法?感谢您的帮助。

2 个答案:

答案 0 :(得分:3)

为获得更好的性能,请使用numpy.select代替apply,如果不符合任何条件,也可以设置默认值:

masks = [(df['A'] >= 0) & (df['A'] < 50),
         (df['A'] >= 50) & (df['A'] < 70),
         (df['A'] >= 70) & (df['A'] <= 100)]

vals = [df['B'], df['B'] / 3, df['B']/df['C']/3]

df['B'] = np.select(masks, vals, default=0)

性能-大约快1000倍:

np.random.seed(5)
df = pd.DataFrame(np.random.randint(100, size=(10000, 3)), columns=list('ABC'))

#Jeril solution
In [74]: %timeit df['B1'] = df.apply(Standard, axis=1)
__main__:18: RuntimeWarning: divide by zero encountered in double_scalars
424 ms ± 16.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [75]: %timeit df['B'] = np.select(masks, vals, default=0)
468 µs ± 4.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

答案 1 :(得分:2)

您可以尝试以下方法吗?

def Standard(row):
    if (row['A'] >= 0) and (row['A'] < 50):
        return row['B']
    if (row['A'] >= 50) and (row['A'] < 70):
        return row['B']/3
    if (row['A'] >= 70) and (row['A'] <= 100):
        return row['B']/row['C']/3


df['B'] = df.apply(Standard, axis=1)