根据每日时间序列数据框创建工作日/周末时间序列数据框

时间:2018-08-13 04:20:46

标签: python python-3.x pandas numpy

例如,我创建了一个包含时间序列信息的数据框

Time      daily-bill
2012-01-01   200
2012-01-02  300
2012-01-03   100
2012-01-04    500
….

我想根据上述时间序列创建另一个时间序列数据帧。如何在熊猫中做到这一点?

Time(weekday-and-weekend)                       total-bill
Monday-Friday
Weekend
Monday-Friday
Weekend
Monday-Friday
Weekend

换句话说,时间步长将是weekdayweekend的连续序列。 weekdayMonday to Friday组成;而weekendSaturdaySunday组成。 total-bill列将存储相应日期发生的账单总和,这些信息来自现有时间序列。

1 个答案:

答案 0 :(得分:1)

使用:

print (df)
        Time  daily-bill
0 2012-01-01         200
1 2012-01-02         300
2 2012-01-03         100
3 2012-01-04         500
4 2012-01-05         200
5 2012-01-06         300
6 2012-01-07         100
7 2012-01-08         500
8 2012-01-09         500

arr = np.where(df['Time'].dt.weekday > 4, 'Weekend','Monday-Friday')

s = pd.Series(arr)
s1 = s.ne(s.shift()).cumsum()

df = (df['daily-bill'].groupby([s1,s.rename('Time')])
                     .sum()
                     .reset_index(level=0, drop=True)
                     .reset_index())
print (df)
            Time  daily-bill
0        Weekend         200
1  Monday-Friday        1400
2        Weekend         600
3  Monday-Friday         500

说明

  1. 首先由weekdaynumpy.where创建Series
  2. 然后创建另一个Series,该cumsumsshift来创建,以区分连续值
  3. 聚集sum,并用drop=True删除reset_index的第一级

详细信息

print (s)
0          Weekend
1    Monday-Friday
2    Monday-Friday
3    Monday-Friday
4    Monday-Friday
5    Monday-Friday
6          Weekend
7          Weekend
8    Monday-Friday
dtype: object

print (s1)
0    1
1    2
2    2
3    2
4    2
5    2
6    3
7    3
8    4
dtype: int32

编辑:

如果输入DataFrame的firts列为DatetimeIndex

print (df)
            daily-bill
Time                  
2012-01-01         200
2012-01-02         300
2012-01-03         100
2012-01-04         500
2012-01-05         200
2012-01-06         300
2012-01-07         100
2012-01-08         500
2012-01-09         500

arr = np.where(df.index.weekday > 4, 'Weekend','Monday-Friday')

s = pd.Series(arr, index=df.index)
s1 = s.ne(s.shift()).cumsum()

df = (df['daily-bill'].groupby([s1,s.rename('Time')])
                     .sum()
                     .reset_index(level=0, drop=True)
                     .reset_index())
print (df)
            Time  daily-bill
0        Weekend         200
1  Monday-Friday        1400
2        Weekend         600
3  Monday-Friday         500