Question

鉴于这个简单的数据集

df = pd.DataFrame({'one':   ['a', 'a', 'a', 'b', 'b', 'b'],
                   'two':   ['c', 'c', 'c', 'c', 'd', 'd'],
                   'three': [1,   2,    3,   4,   5,   6]})

在one / two上进行分组并应用.max()会返回一个在groupby vars上编制索引的系列，正如所料......

df.groupby(['one', 'two'])['three'].max()

输出：

one  two
a    c      3
b    c      4
     d      6
Name: three, dtype: int64

...在我的情况下，我想按小组shift()记录我的记录。但出于某种原因，当我将.shift()应用于groupby对象时，我的结果不包括groupby变量：

输出：

df.groupby(['one', 'two'])['three'].shift()
0    NaN
1    1.0
2    2.0
3    NaN
4    NaN
5    5.0
Name: three, dtype: float64

有没有办法在结果中保留那些groupby变量，作为列或多索引系列（如.max()）？谢谢！

Answer 1

)和max - diff聚合值（返回聚合max）和Series之间存在差异 - 返回相同大小{{1} }。

因此可以将输出附加到新列：

diff

理论上可以使用Series，但它会在pandas df['shifted'] = df.groupby(['one', 'two'])['three'].shift()中返回错误：

agg

ValueError：函数不会减少

如果需要0.20.3 df1 = df.groupby(['one', 'two'])['three'].agg(['max', lambda x: x.shift()]) print (df1)，则transform可能是一个解决方案：

max

Answer 2

正如Jez所解释的那样，移位返回Serise保持相同的数据帧，如果你像max()那样分配它，将会得到错误

功能不会减少

df.assign(shifted=df.groupby(['one', 'two'])['three'].shift()).set_index(['one','two'])
Out[57]: 
         three  shifted
one two                
a   c        1      NaN
    c        2      1.0
    c        3      2.0
b   c        4      NaN
    d        5      NaN
    d        6      5.0

使用max作为键，shift值将值max行

df.groupby(['one', 'two'])['three'].apply(lambda x : x.shift()[x==x.max()])
Out[58]: 
one  two   
a    c    2    2.0
b    c    3    NaN
     d    5    5.0
Name: three, dtype: float64

使用groupbyvars获取Pandas.groupby.shift（）作为cols / index？

2 个答案: