我用两列感兴趣的时间戳数据:a'标签'和一个计数。我想创建一个时间序列,每个标签的总和,例如,每天。我可以使用resample
来实现这一目标吗?
具体例子:
import pandas as pd
import numpy as np
from itertools import cycle
idx = pd.date_range('2016-01-01', '2016-01-07', freq='H')
n = np.random.randint(10, size=24*6+1)
lst = [(l,c) for l,c in zip(cycle(['foo', 'bar']), n)]
df = pd.DataFrame(lst, index=idx, columns=['label', 'n'])
df.resample(???).sum()
对于此示例,目标数据框应包含时间索引和两列(foo
和bar
),其中包含每个间隔的总计数。
答案 0 :(得分:2)
我认为groupby
需要DataFrameGroupBy.resample
:
print (df.groupby('label')
.resample('1D')
.sum()
.reset_index()
.rename(columns={'level_1':'date'}))
label date n
0 bar 2016-01-01 44
1 bar 2016-01-02 60
2 bar 2016-01-03 65
3 bar 2016-01-04 51
4 bar 2016-01-05 37
5 bar 2016-01-06 59
6 foo 2016-01-01 40
7 foo 2016-01-02 69
8 foo 2016-01-03 58
9 foo 2016-01-04 55
10 foo 2016-01-05 67
11 foo 2016-01-06 59
12 foo 2016-01-07 5
使用stack
和unstack
处理datetimeindex
的另一个解决方案:
print (df.set_index('label', append=True)
.unstack(1)
.resample('1D')
.sum()
.stack()
.reset_index()
.rename(columns={'level_0':'date'}))
date label n
0 2016-01-01 bar 44.0
1 2016-01-01 foo 40.0
2 2016-01-02 bar 60.0
3 2016-01-02 foo 69.0
4 2016-01-03 bar 65.0
5 2016-01-03 foo 58.0
6 2016-01-04 bar 51.0
7 2016-01-04 foo 55.0
8 2016-01-05 bar 37.0
9 2016-01-05 foo 67.0
10 2016-01-06 bar 59.0
11 2016-01-06 foo 59.0
12 2016-01-07 foo 5.0
如果需要两列:
df1 = df.set_index('label', append=True).unstack(1).resample('1D').sum()
df1.columns = df1.columns.droplevel(0)
print (df1)
label bar foo
2016-01-01 61.0 65.0
2016-01-02 54.0 56.0
2016-01-03 70.0 53.0
2016-01-04 46.0 49.0
2016-01-05 61.0 49.0
2016-01-06 50.0 55.0
2016-01-07 NaN 6.0