Pandas迭代一系列可变长度切片的最大值

时间:2017-05-19 19:05:06

标签: pandas python-3.5

假设我有一个Pandas DataFrame如下:

RenderFontGlyph

我试图在两个零之间获得该DataFrame的每个切片的最大值。 在那个例子中我会得到:

import pandas as pd
idx = ['2003-01-02', '2003-01-03', '2003-01-06', '2003-01-07',
       '2003-01-08', '2003-01-09', '2003-01-10', '2003-01-13',
       '2003-01-14', '2003-01-15', '2003-01-16', '2003-01-17',
       '2003-01-21', '2003-01-22', '2003-01-23', '2003-01-24',
       '2003-01-27']

a = pd.DataFrame([1,2,0,0,1,2,3,0,0,0,1,2,3,4,5,0,1],
                  columns = ['original'], index = pd.to_datetime(idx))

即:

a['result'] = [0,2,0,0,0,0,3,0,0,0,0,0,0,0,5,0,1]

1 个答案:

答案 0 :(得分:4)

  • 找零!
  • cumsum制作群组
  • mask将零归入他们自己的小组-1
  • 找到每个组中的最大位置idxmax
  • 摆脱组-1的那个,无论如何都是零
  • 获取a.original找到的最大位置,重新索引并填充零
m = a.original.eq(0)
g = a.original.groupby(m.cumsum().mask(m, -1))
i = g.idxmax().drop(-1)
a.assign(result=a.loc[i, 'original'].reindex(a.index, fill_value=0))

            original  result
2003-01-02         1       0
2003-01-03         2       2
2003-01-06         0       0
2003-01-07         0       0
2003-01-08         1       0
2003-01-09         2       0
2003-01-10         3       3
2003-01-13         0       0
2003-01-14         0       0
2003-01-15         0       0
2003-01-16         1       0
2003-01-17         2       0
2003-01-21         3       0
2003-01-22         4       0
2003-01-23         5       5
2003-01-24         0       0
2003-01-27         1       1