我正在做一个气候学,即在具有多年日常数据和日期时间作为索引的数据框中对此进行平均:
df.groupby([df.index.month, df.index.day]).mean()
一旦我执行groupby,日期时间索引就会消失。这是有道理的,因为groupby之后的每一行都没有唯一的日期时间。
有没有办法在groupby完成后通过人为分配一年来重新引入日期时间?
- 编辑数据帧:
datetime val1 val2
1/1/2000 74.25769 5.813470958
1/2/2000 74.25769 5.813470958
1/3/2000 74.25769 5.813470958
1/4/2000 74.25769 5.813470958
1/5/2000 76.67728083 5.813470958
1/6/2000 76.67728083 5.813470958
1/7/2000 76.67728083 5.813470958
1/4/2001 76.67728083 5.813470958
1/5/2001 77.30620917 12.3357252
1/6/2001 77.30620917 12.3357252
1/7/2001 77.30620917 12.3357252
1/8/2001 77.30620917 12.3357252
1/9/2001 77.30620917 12.3357252
1/10/2001 77.30620917 12.3357252
答案 0 :(得分:2)
IIUC您丢失了year
个信息,但您可以groupby
map
之后使用自定义year
months
和days
来设置{} index
:
import datetime
df = df.groupby([df.index.month, df.index.day]).mean()
print df
val1 val2
1 1 74.257690 5.813471
2 74.257690 5.813471
3 74.257690 5.813471
4 75.467485 5.813471
5 76.991745 9.074598
6 76.991745 9.074598
7 76.991745 9.074598
8 77.306209 12.335725
9 77.306209 12.335725
10 77.306209 12.335725
df['Date'] = df.index.map(lambda x: datetime.date(2000, x[0], x[1]))
print df.set_index('Date')
val1 val2
Date
2000-01-01 74.257690 5.813471
2000-01-02 74.257690 5.813471
2000-01-03 74.257690 5.813471
2000-01-04 75.467485 5.813471
2000-01-05 76.991745 9.074598
2000-01-06 76.991745 9.074598
2000-01-07 76.991745 9.074598
2000-01-08 77.306209 12.335725
2000-01-09 77.306209 12.335725
2000-01-10 77.306209 12.335725