Python:如何在熊猫数据框上应用带有条件的函数?

时间:2020-05-18 14:29:43

标签: python pandas

我有一个熊猫数据框,如下所示:

df
     time   case1   case2   case3
0     5     house   bank     atm
1     3     bank    house  pharmacy
2     10    bank    bank     atm
3     20    house  pharmacy  house

我想将基于时间和案例的概率与每种情况相关联。这里我们有每个类别的平均值和标准偏差。因此,例如对于p_house,如果1time20-10=10之间,则概率为20+10=30

p_house = [20, 10]
p_bank =  [5, 1]
p_atm  =  [3, 1]
p_pharmacy = [10, 5]

我想应用一个函数,说明是否在每种情况的范围p = 1或p = 0之间。我想应用这样的功能

def assignP(df):
    if ((df.time < p.case1 + mu.case) and (df.time > p.case1-mu.case)):
              df.time1 = 1
    else: 
              df.time1 = 0
    if ((df.time < p.case2 + mu.case) and (df.time > p.case2-mu.case)):
              df.time2 = 1
    else: 
              df.time2 = 0
    if ((df.time < p.case3 + mu.case) and (df.time > p.case3-mu.case)):
              df.time3 = 1
    else: 
              df.time3 = 0
    return df

我想要一个看起来像下面的数据框

df
     time   case1   case2   case3          p1      p2     p3
0     5     house   bank     atm           0       1      0
1     3     bank    house  pharmacy        0       0      0
2     10    bank    bank     atm           0       0      0
3     15    house  pharmacy  house         1       1      1

1 个答案:

答案 0 :(得分:0)

我相信您应该为不同的列编写一个函数。使用apply可以在行或列上运行简单函数。在该函数中,我将基于平均值/偏差确定输出值。也许下面的内容可以给您一个开始:

from io import StringIO

# Create DataFrame
csvstring = StringIO("""
time case1 case2 case3
0 5 house bank atm
1 3 bank house pharmacy
2 10 bank bank atm
3 20 house pharmacy house
""")
df = pd.read_csv(csvstring, sep=" ")

p_house = [20, 10]

def get_phouse(col):
    # Split the p_house value
    (a, b) = p_house
    # If column value between the 20 - 10 or 20 + 10, return 1, otherwise 0
    return 1 if a-b < col < a+b else 0

df['phouse'] = df['time'].apply(get_phouse)
#    time  case1     case2     case3  phouse
# 0     5  house      bank       atm       0
# 1     3   bank     house  pharmacy       0
# 2    10   bank      bank       atm       0
# 3    20  house  pharmacy     house       1