Question

所以我已经从以下csv创建了一个熊猫数据框：

id  age00   education   marital gender  ethnic  industry    income00
0   51.965         17         0      1       0         5    76110
1   41.807         12         1      0       0         1    43216
2   36.331         12         1      0       1         3    52118
3   56.758          9         1      1       2         2    47770

我的目标是创建一个名为 future_income 的新列，该列将占用每一行并使用我的模型计算未来收入。

这由我在下面创建的类中的 predictFinalIncome 变量完成：

class myModel:
  def __init__(self, bias) :
    self.bias = bias # bias is a dictionary with info to set bias on the gender function and the ethnic function


  def b_gender(self, gender):
    effect = 0
    if (self.bias["gender"]): # if there is gender bias in this model/world (from the constructor) 
      effect = -0.0005 if (gender<1) else 0.0005  # This amount to 1.2% difference annually
    return self.scale * effect

  def b_ethnic(self, ethnic):
    effect = 0
    if (self.bias["ethnic"]): # if there is ethnic bias in this model/world (from the constructor) 
      effect = -0.0007 if (ethnic < 1) else -0.0003 if (ethnic < 2) else 0.0005 
    return self.scale * effect


  # other methods/functions
  def predictGrowthFactor( self, person ): # edited
    factor = 1 + person['education'] + person['marital'] + person['income'] + person['industry']
    return factor

  def predictIncome( self, person ): # perdict the new income one MONTH later. (At least on average, each month the income grows.)
    return person['income']*self.predictGrowthFactor( person )

  def predictFinalIncome( self, n, person ): 
    n_income = self.predictIncome( person )
    for i in range(n):
       n_income = n_income * i
    return n_income

n在这种情况下为120。

简而言之。我想将每一行放入到名为 predictFinalIncome 的类函数中，并在我的df上添加一个名为future_income的新变量，该变量是他们在120个月内的收入。

编辑：

我实际上不需要人类。我不小心删除了确定参数“ bias”的类中的init__。相反，基于@Cavin Dsouza的代码。但这不起作用。

然后像这样读取代码：

utopModel = myModel( { "gender": False, "ethnic": False } ) # no bias


n =120
#Utopia
u = utopModel
world1['incomeFinal_utop'] = world1.apply(lambda row: u.predictFinalIncome(n, row), axis=1)

因此，当进入predictFinalIncome时，错误是这样的：

TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

KeyError        

KeyError: 'income'

Answer 1

我认为您只是在使其变得非常复杂，您所做的所有计算实际上只能由一个函数完成，除非您需要将中间结果用于其他用途。

您可以创建一个可应用于数据框每一行的函数：

def predictFinalIncome(row, n):
    factor = 1 + row['education'] + row['marital'] + row['income'] + row['industry']
    n_income = row['income'] * factor
    for i in range(n):
        n_income = n_income * i
    return n_income

然后，使用df.apply：

df.apply(lambda r: predictFinalIncome(r, 120), axis=1)

返回0是因为当您执行for i in range(n)时，实际上是从0开始的，因此结果始终为0。您需要对其进行修复。

更新：使函数在Model类内部

从您的发贴中，我看不出此功能存在于模型中的明显原因，尤其是此功能不使用任何其他方法或您创建的bias属性，但在这里。

class myModel:
    def __init__(self, bias) :
        self.bias = bias

    def predictFinalIncome(self, row, n):
        factor = 1 + row['education'] + row['marital'] + row['income'] + row['industry']
        n_income = row['income'] * factor
        for i in range(n):
            n_income = n_income * i
        return n_income

# to use:
model = myModel(bias)
df.apply(lambda r: model.predictFinalIncome(r, 120), axis=1)

通过将每一行变成熊猫数据框中的字典来创建新列

1 个答案: