我有一个熊猫数据框,如下所示:
df
time case1 case2 case3
0 5 house bank atm
1 3 bank house pharmacy
2 10 bank bank atm
3 20 house pharmacy house
我想将基于时间和案例的概率与每种情况相关联。这里我们有每个类别的平均值和标准偏差。因此,例如对于p_house
,如果1
在time
和20-10=10
之间,则概率为20+10=30
。
p_house = [20, 10]
p_bank = [5, 1]
p_atm = [3, 1]
p_pharmacy = [10, 5]
我想应用一个函数,说明是否在每种情况的范围p = 1或p = 0之间。我想应用这样的功能
def assignP(df):
if ((df.time < p.case1 + mu.case) and (df.time > p.case1-mu.case)):
df.time1 = 1
else:
df.time1 = 0
if ((df.time < p.case2 + mu.case) and (df.time > p.case2-mu.case)):
df.time2 = 1
else:
df.time2 = 0
if ((df.time < p.case3 + mu.case) and (df.time > p.case3-mu.case)):
df.time3 = 1
else:
df.time3 = 0
return df
我想要一个看起来像下面的数据框
df
time case1 case2 case3 p1 p2 p3
0 5 house bank atm 0 1 0
1 3 bank house pharmacy 0 0 0
2 10 bank bank atm 0 0 0
3 15 house pharmacy house 1 1 1
答案 0 :(得分:0)
我相信您应该为不同的列编写一个函数。使用apply
可以在行或列上运行简单函数。在该函数中,我将基于平均值/偏差确定输出值。也许下面的内容可以给您一个开始:
from io import StringIO
# Create DataFrame
csvstring = StringIO("""
time case1 case2 case3
0 5 house bank atm
1 3 bank house pharmacy
2 10 bank bank atm
3 20 house pharmacy house
""")
df = pd.read_csv(csvstring, sep=" ")
p_house = [20, 10]
def get_phouse(col):
# Split the p_house value
(a, b) = p_house
# If column value between the 20 - 10 or 20 + 10, return 1, otherwise 0
return 1 if a-b < col < a+b else 0
df['phouse'] = df['time'].apply(get_phouse)
# time case1 case2 case3 phouse
# 0 5 house bank atm 0
# 1 3 bank house pharmacy 0
# 2 10 bank bank atm 0
# 3 20 house pharmacy house 1