将函数逐行应用于Pandas数据框(轴= 0)以创建四个新列

时间:2019-01-08 13:06:38

标签: python pandas dataframe

我一直在玩这个游戏,但是我什至无法得到最简单的案例,所以我要寻求帮助。

我有一个大的数据框,我正在尝试向其中添加四个新列。根据以下if语句,每列的值取决于行中的数据。

这是我到目前为止所勾画的内容:

import pandas as pd

d = {'Signal': [0,1,1,0],
   'Win': [False,True,False,False],
   'Odds': [1.1, 1.2, 1.3, 1.4],
   'Helper': [True,False,False,False],
   'before': ['','','',''],
   'stake':['','','',''],
   'result':['','','',''],
   'after':['','','','']
}

df = pd.DataFrame(d)

def function(df, start, stake_size):
   '''
   takes in three arguments: a dataframe, a start number as int and 
   stake_size as int
   the function fills up before, stake, result, after columns row by row 
   using the IF statements below
   '''
   #if df['Helper']:
   #    df['before'] = start
   #else:
   #    df['before'] = df['after'].shift(1)

   df['before'] = start #This is so I can replicate the example

   if df['Signal'] == 0:
       df['stake'] = 0
       df['result'] = 0
   elif df['Signal'] == 1:
       df['stake'] = df['before'] * (stake_size/100)

   if (df['Signal'] == 1 & df['Win'] == True):
       df['result'] = (df['stake'] * df['odds']) - df['stake']
   else:
       df['result'] = df['stake'] * -1

   df['after'] = df['before'] + df['result']

   return df

df.apply(function, args=(100,5), axis=1)

可以说,这不能带我去任何地方。

我习惯使用.apply(function, axis=1)创建列,但是在这种情况下不起作用,因为要在此之前进行计算,我需要在同一行中进行计算。就是每行需要顺序填充。这就是为什么我试图将其作为一种函数来处理的原因,该函数接收行并计算四个新列的值。

感谢您提供的任何帮助或类似示例,以帮助您解决这里的问题。谢谢。

编辑:我接受了HakunaMaData的建议,并在df中添加了Helper列,以确保按预期方式应用第一个if语句。我起初以为.shift可以在这里工作,但这不是因为我不能在连续应用时移动整个数据框,对吗?

还有其他方法可以解决这个问题吗?

我正在寻找的预期输出是:

answer = {'Signal': [0,1,1,0],
          'Win': [False,True,False,False],
          'Odds': [1.1, 1.2, 1.3, 1.4],
          'Helper': [True,False,False,False],
          'before': [100,100,101,94.95],
          'stake':[0,5,5,0],
          'result':[0,1,-5,0],
          'after':[100,101,95.95,95.95]
          }

2 个答案:

答案 0 :(得分:2)

这里有几个问题:

Before,Stake,After,Result等应该是数字类型,而不是字符串。因此,请更改它们-像这样:

d = {'Signal': [0,1,1,0],
   'Win': [False,True,False,False],
   'Odds': [1.1, 1.2, 1.3, 1.4],
   'before': [0]*4,
   'stake':[0]*4,
   'result':[0]*4,
   'after':[0]*4
}

现在其余的代码通常可以正常工作:

df = pd.DataFrame(d)

def function(df, start, stake_size):

   '''
   takes in three arguments: a dataframe, a start number as int and 
   stake_size as int
   the function fills up before, stake, result, after columns row by row 
   using the IF statements below
   '''

   global after #Create a global variable to track the value in the previous row

   if df.name == 0: 
       df['before'] = start
   else: 
        df['before'] = after 

   if df['Signal'] == 0:
       df['stake'] = 0
       df['result'] = 0
   elif df['Signal'] == 1:
       df['stake'] = df['before'] * (stake_size/100)

   if (df['Signal'] == 1 & df['Win'] == True):
       df['result'] = (df['stake'] * df['odds']) - df['stake']
   else:
       df['result'] = df['stake'] * -1

   df['after'] = df['before'] + df['result']

   after = df['after'] #assign the value to the global variable at the end

   return df

最后,使用行轴而不是列轴:

df.apply(function, args=(100,5), axis=1)

以下是输出:

enter image description here

答案 1 :(得分:1)

首先,您需要更改功能,您将使用逐行应用: -context

那么您函数的签名将是:

PersonnelDbContext

注意!在这种情况下,您无需使用df.apply(lambda x: function(x,100,5), axis=1)来操作数据帧,而是要使用一行来操作它,因此必须在函数中调整代码。

希望这会有所帮助!