我希望根据提供的促销对我的数据框进行分组并计算百分比。数据框格式如下
null
如果我的数据框被称为df。我将如何按促销名称分组并计算天数百分比并重命名该列。所以,我的第一栏将是“租金少于1个月”。在R中,我会说:
await Excel.run(async context => {
const worksheetDays = ["Sunday", "Monday", /*...*/];
const sheetRanges = {} as { [key: string]: Excel.Range }
const currentSheets = context.workbook.worksheets;
for (let i = 0; i < worksheetDays.length; i++) {
sheetRanges[worksheetDays[i]] = currentSheets.getItem(worksheetDays[i]).getUsedRange().load("values");
}
await context.sync();
console.log(sheetRanges["Sunday"].values);
});
有人可以帮助python。我尝试了以下方法:
我希望格式为:
Promotion name days rented
nan 577
first month half off 88
nan 22
second month free 55
nan 60
first month half off 20
我试过
df %>% group_by(`Promotion Name`) %>%
summarise("# Rentals < 1 month" = sum(`Days rented` <= 30)/length(`Days rented`)
但这并没有给我我想要的东西,因为我想要总结几天&lt; 30并计算长度,最后重命名列。谢谢。
答案 0 :(得分:2)
我认为您需要使用groupby
自定义功能boolean indexing
:
df = rented_df.groupby('Promotion name')['days rented']
.apply(lambda x: x[x<=30].sum()/len(x)).reset_index(name='# Rentals < 1 month')
print (df)
Promotion name # Rentals < 1 month
1 first month half off 10.000000
2 second month free 0.000000
但是groupby默认情况下会删除NaN
,因此如果需要,请先将NaN
替换为fillna
之前列中没有的字符串:
rented_df['Promotion name'] = rented_df['Promotion name'].fillna('NANS strings')
df = rented_df.groupby('Promotion name')['days rented']
.apply(lambda x: x[x<=30].sum()/len(x)).reset_index(name='# Rentals < 1 month')
print (df)
Promotion name # Rentals < 1 month
0 NANS strings 7.333333
1 first month half off 10.000000
2 second month free 0.000000
对于单独的列需要transform
:
rented_df['Promotion name'] = rented_df['Promotion name'].fillna('NANS strings')
rented_df['# Rentals < 1 month'] = rented_df.groupby('Promotion name')['days rented']
.transform(lambda x: x[x<=30].sum()/len(x))
print (rented_df)
Promotion name days rented # Rentals < 1 month
0 NANS strings 577 7.333333
1 first month half off 88 10.000000
2 NANS strings 22 7.333333
3 second month free 55 0.000000
4 NANS strings 60 7.333333
5 first month half off 20 10.000000
编辑:
rented_df['Promotion name'] = rented_df['Promotion name'].fillna('NANS strings')
g = rented_df.groupby('Promotion name')['days rented']
s1 = g.apply(lambda x: x[x<=30].sum()/len(x)).rename('# Rentals < 1 month')
s2 = g.apply(lambda x: x[x<=60].sum()/len(x)).rename('# Rentals < 2 month')
s3 = g.apply(lambda x: x[x<=90].sum()/len(x)).rename('# Rentals < 3 month')
df = pd.concat([s1,s2,s3], axis=1).reset_index()
print (df)
Promotion name # Rentals < 1 month # Rentals < 2 month \
0 NANS strings 7.333333 27.333333
1 first month half off 10.000000 10.000000
2 second month free 0.000000 55.000000
# Rentals < 3 month
0 27.333333
1 54.000000
2 55.000000