Question

在R的ddply函数中，您可以按组计算任何新列，并将结果附加到原始数据框，例如：

ddply(mtcars, .(cyl), transform, n=length(cyl)) # n is appended to the df

在Python / pandas中，我先计算它，然后合并，例如：

df1 = mtcars.groupby("cyl").apply(lambda x: Series(x["cyl"].count(), index=["n"])).reset_index()
mtcars = pd.merge(mtcars, df1, on=["cyl"])

或类似的东西。

然而，我总觉得这很令人生畏，所以一次这样做是否可行？

感谢。

Answer 1

您可以通过将groupby / transform操作的结果分配给DataFrame来添加列：

mtcars['n'] = mtcars.groupby("cyl")['cyl'].transform('count')

import pandas as pd
import pandas.rpy.common as com

mtcars = com.load_data('mtcars')
mtcars['n'] = mtcars.groupby("cyl")['cyl'].transform('count')
print(mtcars.head())

产量

                    mpg  cyl  disp   hp  drat     wt   qsec  vs  am  gear  carb   n
Mazda RX4          21.0    6   160  110  3.90  2.620  16.46   0   1     4     4   7
Mazda RX4 Wag      21.0    6   160  110  3.90  2.875  17.02   0   1     4     4   7
Datsun 710         22.8    4   108   93  3.85  2.320  18.61   1   1     4     1  11
Hornet 4 Drive     21.4    6   258  110  3.08  3.215  19.44   1   0     3     1   7
Hornet Sportabout  18.7    8   360  175  3.15  3.440  17.02   0   0     3     2  14

要添加多个列，您可以使用groupby/apply。确保您应用的函数返回一个与其输入具有相同索引的DataFrame。例如，

mtcars[['n','total_wt']] = mtcars.groupby("cyl").apply(
    lambda x: pd.DataFrame({'n': len(x['cyl']), 'total_wt': x['wt'].sum()},
                           index=x.index))
print(mtcars.head())

产量

                    mpg  cyl  disp   hp  drat     wt   qsec  vs  am  gear  carb   n  total_wt
Mazda RX4          21.0    6   160  110  3.90  2.620  16.46   0   1     4     4   7    21.820
Mazda RX4 Wag      21.0    6   160  110  3.90  2.875  17.02   0   1     4     4   7    21.820
Datsun 710         22.8    4   108   93  3.85  2.320  18.61   1   1     4     1  11    25.143
Hornet 4 Drive     21.4    6   258  110  3.08  3.215  19.44   1   0     3     1   7    21.820
Hornet Sportabout  18.7    8   360  175  3.15  3.440  17.02   0   0     3     2  14    55.989

Python / pandas中R / ddply的等效变换？

1 个答案: