我有一个包含40列和400000行的pandas数据帧。我在3列上创建了一个卷起的数据集。
现在,我需要根据两列来计算%指标。 Python抛出错误 -
unsupported operand type(s) for /: 'SeriesGroupBy' and 'SeriesGroupBy'
这是示例代码:
print sample_data
date part receipt bad_dollars total_dollars bad_percent
0 1 123 22 40 100 NaN
1 2 456 44 80 120 NaN
2 3 134 33 30 150 NaN
3 1 123 22 80 100 NaN
4 5 456 45 40 90 NaN
5 3 134 33 85 150 NaN
6 7 123 24 70 120 NaN
7 5 456 45 20 85 NaN
8 9 134 35 50 300 NaN
9 7 123 24 300 600 NaN
sample_data_group = sample_data.groupby(['date','part','receipt'])
sample_data_group['bad_percents']=sample_data_group['bad_dollars']/sample_data_group['total_dollars']
TypeError: unsupported operand type(s) for /: 'SeriesGroupBy' and 'SeriesGroupBy'
请帮忙!
答案 0 :(得分:3)
您可以使用groupby对象上的apply来执行此操作:
import pandas as pd
import numpy as np
cols = ['index', 'date', 'part', 'receipt', 'bad_dollars', 'total_dollars',
'bad_percent']
sample_data = pd.DataFrame([
[0, 1, 123, 22, 40, 100, np.nan],
[1, 2, 456, 44, 80, 120, np.nan],
[2, 3, 134, 33, 30, 150, np.nan],
[3, 1, 123, 22, 80, 100, np.nan],
[4, 5, 456, 45, 40, 90, np.nan],
[5, 3, 134, 33, 85, 150, np.nan],
[6, 7, 123, 24, 70, 120, np.nan],
[7, 5, 456, 45, 20, 85, np.nan],
[8, 9, 134, 35, 50, 300, np.nan],
[9, 7, 123, 24, 300, 600, np.nan]],
columns = cols).set_index('index', drop = True)
sample_data_group = sample_data.groupby(['date','part','receipt'])
xx = sample_data_group.apply(
lambda x: x.assign(bad_percent = x.bad_dollars/x.total_dollars))\
.reset_index(['date','part', 'receipt'], drop = True)