groupby.col.diff发出意外错误

时间:2018-05-02 09:44:26

标签: python pandas

df = pd.DataFrame(
    {'ts':[1,2,3,4,60,61,62,63,64,150,155,156,
           1,2,3,4,60,61,62,63,64,150,155,156,
           1,2,3,4,60,61,62,63,64,150,155,156],
    'id': [1,2,3,4,60,61,62,63,64,150,155,156,
           71,72,73,74,80,81,82,83,64,160,165,166,
           21,22,23,24,90,91,92,93,94,180,185,186],
    'other':['x','x','x','x','x','x','x','x','x','x','x','x',
             'y','y','y','y','y','y','y','y','y','y','y','y',
             'z','z','z','z','z','z','z','z','z','z','z','z'],
    'user':['x','x','x','x','y','x','x','x','x','x','x','x',
            'y','y','y','y','x','y','y','y','y','y','y','y',
            'z','z','z','z','z','z','z','z','z','z','z','z']
    })


df.set_index('id', inplace=True)
df.sort_values('ts',inplace=True)


for x, g in df.groupby('user'):
    # call 1
    print(g.ts.diff())

# call 2
df.groupby('user').ts.diff()

我不确定为什么我在通话2中收到错误。此外,我注意到当我删除sort_values时,通话2通过。

有人可以解释一下这种行为吗?

1 个答案:

答案 0 :(得分:0)

无论是否调用排序,我都会收到错误。在任何情况下,我认为您正在寻找的是:

df['group_diff'] = df.ts.groupby(df.user).transform(pd.Series.diff)
>>> df.head()
    other   ts  user    group_diff
id              
1   x   1   x   NaN
2   x   2   x   1.0
3   x   3   x   1.0
4   x   4   x   1.0
60  x   60  y   Nan

groupby之后,执行transform,使用某些功能在每个组中为每个条目创建一个条目。这个函数只是pd.Series.diff

请注意您在第0行和第4行上的Nan - 它们分别对应xy组的开头。