Question

我有一个显示的数据框; 1）日期，价格和3）两个价格之间的差异。

dates | data | result     | change
24-09    24      0           none
25-09    26      2           pos
26-09    27      1           pos
27-09    28      1           pos
28-09    26     -2           neg

我想在新数据框中创建上述数据的摘要。摘要将有4列：1）开始日期，2）结束日期3）天数4）运行

例如使用上面的例子，从25-09和27-09开始有+4的正向运行，所以我希望这样的数据帧如下所示：

在新数据框中，结果值从正到负的每次更改都会有一个新行。如果run = 0，则表示与前一天的价格没有变化，并且在数据框中也需要自己的行。

start date | end date | num days | run 
 25-09        27-09        3        4         
 27-09        28-09        1        -2
 23-09        24-09        1        0

我认为第一步是根据run的值创建一个新列“change”，然后显示以下任何一个：“positive”，“negative”或“no change”。那么也许我可以在这一栏中进行分组。

Answer 1

这种问题的一些有用功能是diff（）和cumsum（）。

我为您的示例数据添加了一些额外的数据点，以充实功能。

挑选和选择分配给不同列的不同（和多个）聚合函数的能力是pandas的一个超级特性。

df = pd.DataFrame({'dates': ['24-09', '25-09', '26-09', '27-09', '28-09', '29-09', '30-09','01-10','02-10','03-10','04-10'],
                    'data': [24, 26, 27, 28, 26,25,30,30,30,28,25],
                    'result': [0,2,1,1,-2,0,5,0,0,-2,-3]})

def cat(x):
    return 1 if  x > 0 else -1 if x < 0 else 0

df['cat'] =  df['result'].map(lambda x : cat(x)) # probably there is a better way to do this

df['change'] = df['cat'].diff()  
df['change_flag'] = df['change'].map(lambda x: 1 if x != 0 else x)
df['change_cum_sum'] = df['change_flag'].cumsum() # which gives us our groupings


foo = df.groupby(['change_cum_sum']).agg({'result' : np.sum,'dates' : [np.min,np.max,'count'] })
foo.reset_index(inplace=True)
foo.columns = ['id','start date','end date','num days','run' ]
print foo

产生：

   id start date end date  num days  run
0   1      24-09    24-09         1    0
1   2      25-09    27-09         3    4
2   3      28-09    28-09         1   -2
3   4      29-09    29-09         1    0
4   5      30-09    30-09         1    5
5   6      01-10    02-10         2    0
6   7      03-10    04-10         2   -5

在pandas数据框中按日期创建价格之间的移动摘要

1 个答案: