Question

如果函数有多个参数，当在pandas中编写要与groupby.apply或groupby.transform一起使用的函数时，当调用函数作为groupby的一部分时，参数将使用逗号而不是括号。一个例子是：

def Transfunc(df, arg1, arg2, arg2):
     return something

GroupedData.transform(Transfunc, arg1, arg2, arg3)

df参数自动作为第一个参数传递。

但是，使用函数对数据进行分组时，似乎无法使用相同的语法。请看以下示例：

people = DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
people.ix[2:3, ['b', 'c']] = NA

def MeanPosition(Ind, df, Column):
    if df[Column][Ind] >= np.mean(df[Column]):
        return 'Greater Group'
    else:
        return 'Lesser Group'
# This function compares each data point in column 'a' to the mean of column 'a' and return a group name based on whether it is greater than or less than the mean

people.groupby(lambda x: MeanPosition(x, people, 'a')).mean()

以上工作正常，但我不明白为什么我必须将函数包装在lambda中。根据转换使用的语法并应用，在我看来，以下应该可以正常工作：

people.groupby(MeanPosition, people, 'a').mean()

任何人都可以告诉我为什么，或者如何在不将其包裹在lambda中的情况下调用该函数？

由于

编辑：我不认为可以通过将函数作为键传递来对数据进行分组，而不将该函数包装在lambda中。一种可能的解决方法是，不是将函数作为键传递，而是传递由函数创建的数组。这可以通过以下方式工作：

def MeanPositionList(df, Column):
    return ['Greater Group' if df[Column][row] >= np.mean(df[Column]) else 'Lesser Group' for row in df.index]

Grouped = people.groupby(np.array(MeanPositionList(people, 'a')))
Grouped.mean()

但是当然最好只是将中间人函数一起切掉，然后简单地使用一个带有列表comprhension的数组....

Answer 1

将参数传递给apply恰好起作用，因为apply传递了目标函数的所有参数。

但是，groupby需要多个参数，请参阅here，因此无法区分参数;传递一个lambda / named函数更明确，也是最好的方法。

以下是如何做我认为你想要的（稍微修改，因为你的例子中有所有不同的组）

In [22]: def f(x):
   ....:     result = Series('Greater',index=x.index)
   ....:     result[x<x.mean()] = 'Lesser'
   ....:     return result
   ....: 

In [25]: df = DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe', 'Joe', 'Wes', 'Wes', 'Travis'])

In [26]: df
Out[26]: 
               a         b         c         d         e
Joe    -0.293926  1.006531  0.289749 -0.186993 -0.009843
Joe    -0.228721 -0.071503  0.293486  1.126972 -0.808444
Wes     0.022887 -1.813960  1.195457  0.216040  0.287745
Wes    -1.520738 -0.303487  0.484829  1.644879  1.253210
Travis -0.061281 -0.517140  0.504645 -1.844633  0.683103

In [27]: df.groupby(df.index.values).transform(f)
Out[27]: 
              a        b        c        d        e
Joe      Lesser  Greater   Lesser   Lesser  Greater
Joe     Greater   Lesser  Greater  Greater   Lesser
Travis  Greater  Greater  Greater  Greater  Greater
Wes     Greater   Lesser  Greater   Lesser   Lesser
Wes      Lesser  Greater   Lesser  Greater  Greater

使用Groupby时调用具有多个参数的函数

1 个答案: