pandas .diff通过行的多索引值

时间:2018-11-05 07:15:53

标签: python pandas pandas-groupby

In [66]: t1
Out[69]: 
job_date  branch_id
2018-05   1            0.618980
          2            0.600590
          3            0.603486
          4            0.043931
          5            0.588168
          6            0.381518
          7            0.357035
2018-06   1            0.690575
          2            0.700900
          3            0.571556
          4            0.351935
          5            0.626428
          6            0.461813
          7            0.329663
Name: utilization, dtype: float64

In [86]: t1.index
Out[86]: 
MultiIndex(levels=[[2018-05, 2018-06], [1, 2, 3, 4, 5, 6, 7]],
           labels=[[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1], [0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6]],
           names=['job_date', 'branch_id'])

如何通过索引值对行进行区分?

所以 (2018-05,1)和(2018-06,1)应该为0.690575-0.618980=0.071595

如果我执行t1.diff(),则会得到逐行比较,这不是我想要的

In [87]: t1.diff()
Out[87]: 
job_date  branch_id
2018-05   1                 NaN
          2           -0.018390
          3            0.002895
          4           -0.559554
          5            0.544237
          6           -0.206651
          7           -0.024483
2018-06   1            0.333540
          2            0.010325
          3           -0.129345
          4           -0.219621
          5            0.274494
          6           -0.164615
          7           -0.132150

现在我正在这样做

In [49]: t1.unstack(level=0)['utilization'].diff(axis=1)
Out[49]: 
job_date   2018-05   2018-06
branch_id                   
1              NaN  0.071595
2              NaN  0.100310
3              NaN -0.031930
4              NaN  0.308003
5              NaN  0.038260
6              NaN  0.080295
7              NaN -0.027372

有没有不堆叠的方法?

2 个答案:

答案 0 :(得分:1)

一种可能的解决方案是将DispatchQueue.main.async { self.getOrientation() } 偏移一个月并减去,如果每个MultiIndex之间的差异相同-此处为一个月,它将起作用:

Period

答案 1 :(得分:0)

您可以使用groupby而无需像这样堆叠

import pandas as pd

ix = pd.MultiIndex(
    levels=[['2018-05', '2018-06'], [1, 2, 3, 4, 5, 6, 7]],
    labels=[[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
            [0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6]],
    names=['job_date', 'branch_id'])
​

series = pd.Series(
    [0.618980, 0.600590, 0.603486, 0.043931, 0.588168, 0.381518,
     0.357035, 0.690575, 0.700900, 0.571556, 0.351935, 0.626428,
     0.461813, 0.329663], 
    index=ix)

series.groupby(by='branch_id').diff()

输出:

job_date  branch_id
2018-05   1                nan
          2                nan
          3                nan
          4                nan
          5                nan
          6                nan
          7                nan
2018-06   1            0.07160
          2            0.10031
          3           -0.03193
          4            0.30800
          5            0.03826
          6            0.08029
          7           -0.02737
dtype: float64