dplyr groupby百分比并重命名列

时间:2017-08-13 04:23:52

标签: r python-2.7 pandas group-by

我希望根据提供的促销对我的数据框进行分组并计算百分比。数据框格式如下

null

如果我的数据框被称为df。我将如何按促销名称分组并计算天数百分比并重命名该列。所以,我的第一栏将是“租金少于1个月”。在R中,我会说:

await Excel.run(async context => {
    const worksheetDays = ["Sunday", "Monday", /*...*/];
    const sheetRanges = {} as { [key: string]: Excel.Range }

    const currentSheets = context.workbook.worksheets;
    for (let i = 0; i < worksheetDays.length; i++) {
        sheetRanges[worksheetDays[i]] = currentSheets.getItem(worksheetDays[i]).getUsedRange().load("values");
    }

    await context.sync();
    console.log(sheetRanges["Sunday"].values);
});

有人可以帮助python。我尝试了以下方法:

我希望格式为:

Promotion name             days rented
nan                        577
first month half off       88
nan                        22
second month free          55
nan                        60
first month half off       20

我试过

df %>% group_by(`Promotion Name`) %>% 
summarise("# Rentals < 1 month" = sum(`Days rented` <= 30)/length(`Days rented`)

但这并没有给我我想要的东西,因为我想要总结几天&lt; 30并计算长度,最后重命名列。谢谢。

1 个答案:

答案 0 :(得分:2)

我认为您需要使用groupby自定义功能boolean indexing

df = rented_df.groupby('Promotion name')['days rented']
              .apply(lambda x: x[x<=30].sum()/len(x)).reset_index(name='# Rentals < 1 month')
print (df)
         Promotion name  # Rentals < 1 month
1  first month half off            10.000000
2     second month free             0.000000

但是groupby默认情况下会删除NaN,因此如果需要,请先将NaN替换为fillna之前列中没有的字符串:

rented_df['Promotion name'] = rented_df['Promotion name'].fillna('NANS strings')
df = rented_df.groupby('Promotion name')['days rented']
              .apply(lambda x: x[x<=30].sum()/len(x)).reset_index(name='# Rentals < 1 month')
print (df)
         Promotion name  # Rentals < 1 month
0          NANS strings             7.333333
1  first month half off            10.000000
2     second month free             0.000000

对于单独的列需要transform

rented_df['Promotion name'] = rented_df['Promotion name'].fillna('NANS strings')
rented_df['# Rentals < 1 month'] = rented_df.groupby('Promotion name')['days rented']
                                            .transform(lambda x: x[x<=30].sum()/len(x))
print (rented_df)
         Promotion name  days rented  # Rentals < 1 month
0          NANS strings          577             7.333333
1  first month half off           88            10.000000
2          NANS strings           22             7.333333
3     second month free           55             0.000000
4          NANS strings           60             7.333333
5  first month half off           20            10.000000

编辑:

rented_df['Promotion name'] = rented_df['Promotion name'].fillna('NANS strings')
g = rented_df.groupby('Promotion name')['days rented']
s1 = g.apply(lambda x: x[x<=30].sum()/len(x)).rename('# Rentals < 1 month')
s2 = g.apply(lambda x: x[x<=60].sum()/len(x)).rename('# Rentals < 2 month')
s3 = g.apply(lambda x: x[x<=90].sum()/len(x)).rename('# Rentals < 3 month')
df = pd.concat([s1,s2,s3], axis=1).reset_index()
print (df)
         Promotion name  # Rentals < 1 month  # Rentals < 2 month  \
0          NANS strings             7.333333            27.333333   
1  first month half off            10.000000            10.000000   
2     second month free             0.000000            55.000000   

   # Rentals < 3 month  
0            27.333333  
1            54.000000  
2            55.000000