例如,我创建了一个包含时间序列信息的数据框
Time daily-bill
2012-01-01 200
2012-01-02 300
2012-01-03 100
2012-01-04 500
….
我想根据上述时间序列创建另一个时间序列数据帧。如何在熊猫中做到这一点?
Time(weekday-and-weekend) total-bill
Monday-Friday
Weekend
Monday-Friday
Weekend
Monday-Friday
Weekend
换句话说,时间步长将是weekday
和weekend
的连续序列。 weekday
由Monday to Friday
组成;而weekend
由Saturday
和Sunday
组成。 total-bill
列将存储相应日期发生的账单总和,这些信息来自现有时间序列。
答案 0 :(得分:1)
使用:
print (df)
Time daily-bill
0 2012-01-01 200
1 2012-01-02 300
2 2012-01-03 100
3 2012-01-04 500
4 2012-01-05 200
5 2012-01-06 300
6 2012-01-07 100
7 2012-01-08 500
8 2012-01-09 500
arr = np.where(df['Time'].dt.weekday > 4, 'Weekend','Monday-Friday')
s = pd.Series(arr)
s1 = s.ne(s.shift()).cumsum()
df = (df['daily-bill'].groupby([s1,s.rename('Time')])
.sum()
.reset_index(level=0, drop=True)
.reset_index())
print (df)
Time daily-bill
0 Weekend 200
1 Monday-Friday 1400
2 Weekend 600
3 Monday-Friday 500
说明:
weekday
和numpy.where
创建Series
。Series
,该cumsum
由s
移shift
来创建,以区分连续值sum
,并用drop=True
删除reset_index
的第一级详细信息:
print (s)
0 Weekend
1 Monday-Friday
2 Monday-Friday
3 Monday-Friday
4 Monday-Friday
5 Monday-Friday
6 Weekend
7 Weekend
8 Monday-Friday
dtype: object
print (s1)
0 1
1 2
2 2
3 2
4 2
5 2
6 3
7 3
8 4
dtype: int32
编辑:
如果输入DataFrame
的firts列为DatetimeIndex
:
print (df)
daily-bill
Time
2012-01-01 200
2012-01-02 300
2012-01-03 100
2012-01-04 500
2012-01-05 200
2012-01-06 300
2012-01-07 100
2012-01-08 500
2012-01-09 500
arr = np.where(df.index.weekday > 4, 'Weekend','Monday-Friday')
s = pd.Series(arr, index=df.index)
s1 = s.ne(s.shift()).cumsum()
df = (df['daily-bill'].groupby([s1,s.rename('Time')])
.sum()
.reset_index(level=0, drop=True)
.reset_index())
print (df)
Time daily-bill
0 Weekend 200
1 Monday-Friday 1400
2 Weekend 600
3 Monday-Friday 500