我一直在玩这个游戏,但是我什至无法得到最简单的案例,所以我要寻求帮助。
我有一个大的数据框,我正在尝试向其中添加四个新列。根据以下if语句,每列的值取决于行中的数据。
这是我到目前为止所勾画的内容:
import pandas as pd
d = {'Signal': [0,1,1,0],
'Win': [False,True,False,False],
'Odds': [1.1, 1.2, 1.3, 1.4],
'Helper': [True,False,False,False],
'before': ['','','',''],
'stake':['','','',''],
'result':['','','',''],
'after':['','','','']
}
df = pd.DataFrame(d)
def function(df, start, stake_size):
'''
takes in three arguments: a dataframe, a start number as int and
stake_size as int
the function fills up before, stake, result, after columns row by row
using the IF statements below
'''
#if df['Helper']:
# df['before'] = start
#else:
# df['before'] = df['after'].shift(1)
df['before'] = start #This is so I can replicate the example
if df['Signal'] == 0:
df['stake'] = 0
df['result'] = 0
elif df['Signal'] == 1:
df['stake'] = df['before'] * (stake_size/100)
if (df['Signal'] == 1 & df['Win'] == True):
df['result'] = (df['stake'] * df['odds']) - df['stake']
else:
df['result'] = df['stake'] * -1
df['after'] = df['before'] + df['result']
return df
df.apply(function, args=(100,5), axis=1)
可以说,这不能带我去任何地方。
我习惯使用.apply(function, axis=1)
创建列,但是在这种情况下不起作用,因为要在此之前进行计算,我需要在同一行中进行计算。就是每行需要顺序填充。这就是为什么我试图将其作为一种函数来处理的原因,该函数接收行并计算四个新列的值。
感谢您提供的任何帮助或类似示例,以帮助您解决这里的问题。谢谢。
编辑:我接受了HakunaMaData的建议,并在df中添加了Helper列,以确保按预期方式应用第一个if语句。我起初以为.shift
可以在这里工作,但这不是因为我不能在连续应用时移动整个数据框,对吗?
还有其他方法可以解决这个问题吗?
我正在寻找的预期输出是:
answer = {'Signal': [0,1,1,0],
'Win': [False,True,False,False],
'Odds': [1.1, 1.2, 1.3, 1.4],
'Helper': [True,False,False,False],
'before': [100,100,101,94.95],
'stake':[0,5,5,0],
'result':[0,1,-5,0],
'after':[100,101,95.95,95.95]
}
答案 0 :(得分:2)
这里有几个问题:
Before,Stake,After,Result等应该是数字类型,而不是字符串。因此,请更改它们-像这样:
d = {'Signal': [0,1,1,0],
'Win': [False,True,False,False],
'Odds': [1.1, 1.2, 1.3, 1.4],
'before': [0]*4,
'stake':[0]*4,
'result':[0]*4,
'after':[0]*4
}
现在其余的代码通常可以正常工作:
df = pd.DataFrame(d)
def function(df, start, stake_size):
'''
takes in three arguments: a dataframe, a start number as int and
stake_size as int
the function fills up before, stake, result, after columns row by row
using the IF statements below
'''
global after #Create a global variable to track the value in the previous row
if df.name == 0:
df['before'] = start
else:
df['before'] = after
if df['Signal'] == 0:
df['stake'] = 0
df['result'] = 0
elif df['Signal'] == 1:
df['stake'] = df['before'] * (stake_size/100)
if (df['Signal'] == 1 & df['Win'] == True):
df['result'] = (df['stake'] * df['odds']) - df['stake']
else:
df['result'] = df['stake'] * -1
df['after'] = df['before'] + df['result']
after = df['after'] #assign the value to the global variable at the end
return df
最后,使用行轴而不是列轴:
df.apply(function, args=(100,5), axis=1)
以下是输出:
答案 1 :(得分:1)
首先,您需要更改功能,您将使用逐行应用:
-context
那么您函数的签名将是:
PersonnelDbContext
注意!在这种情况下,您无需使用df.apply(lambda x: function(x,100,5), axis=1)
来操作数据帧,而是要使用一行来操作它,因此必须在函数中调整代码。
希望这会有所帮助!