如果我有以下df,我想按A列分组并将D列除以每个A的最大D.
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
...: ...: 'foo', 'bar', 'foo', 'foo'],
...: ...: 'B' : ['one', 'one', 'two', 'three',
...: ...: 'two', 'two', 'one', 'three'],
...: ...: 'C' : np.random.randn(8),
...: ...: 'D' : np.random.randn(8)})
我试过像
这样的东西max_by_id = df.groupby('A')['D'].max()
df = df.set_index('A')
df['D'] /= max_by_id.reset_index()['D']
但是这给了我
ValueError: cannot reindex from a duplicate axis
答案 0 :(得分:2)
// module.js
var name = "foobar";
// export it
exports.name = name;
Then, in route.js...
> //route.js
> // get a reference to your required module
> var myModule = require('./module');
> //correct path to folder where your above file is
> // name is a member of myModule due to the export above
> var name = myModule.name;
对象上聚合的计算最大值具有缩减的索引,因此错误,如果要将原始df列除以聚合,则可以在{{groupby
上调用transform
1}} object,使索引对齐:
groupby
你可以看到差异:
In [192]:
df['D'].div(df.groupby('A')['D'].transform('max'))
Out[192]:
0 -0.601098
1 -0.553823
2 -0.408006
3 1.000000
4 0.312029
5 0.709397
6 1.000000
7 0.140932
Name: D, dtype: float64
此外,当您In [193]:
df.groupby('A')['D'].transform('max')
Out[193]:
0 1.508660
1 1.378085
2 1.508660
3 1.378085
4 1.508660
5 1.378085
6 1.508660
7 1.508660
Name: D, dtype: float64
In [194]:
df.groupby('A')['D'].max()
Out[194]:
A
bar 1.378085
foo 1.508660
Name: D, dtype: float64
时,它会删除原始的reset_index
列标签:
grouped
但在此之前,您将索引设置为列' A'但是这会失败:
In [198]:
max_by_id.reset_index()['D']
Out[198]:
0 0.215997
1 0.962928
Name: D, dtype: float64
此外,您可以使用df['D'] /= max_by_id.reset_index()['D']
与lambda
在同一apply
中执行此操作:
lambda