我有一个如下所示的数据框,
Year Month.of.absence Absenteeism.time.in.hours
1 2007 7 91.59784
2 2007 8 105.61894
3 2007 9 62.93516
4 2007 10 75.52850
5 2007 11 81.42555
6 2007 12 65.12712
7 2008 1 60.00000
8 2008 2 87.00000
9 2008 3 193.65425
10 2008 4 83.64144
11 2008 5 62.93557
12 2008 6 99.92567
13 2008 7 113.00000
14 2008 8 66.54835
15 2008 9 82.20302
16 2008 10 122.12742
17 2008 11 96.66115
18 2008 12 69.00000
19 2009 1 43.64738
20 2009 2 86.34233
21 2009 3 130.63708
22 2009 4 57.88069
23 2009 5 75.00000
24 2009 6 63.00224
25 2009 7 96.55052
26 2009 8 76.00000
27 2009 9 41.00000
28 2009 10 99.00000
29 2009 11 87.82694
30 2009 12 59.69254
31 2010 1 71.00000
32 2010 2 101.00000
33 2010 3 135.14169
34 2010 4 95.43113
35 2010 5 131.27026
36 2010 6 78.98095
37 2010 7 75.11503
我想将其从python转换为从2007年7月到2010年7月的时间序列。 这是R代码,在python中等同于
ts_complete_data = ts(aggre.absent.hours.months$Absenteeism.time.in.hours, frequency = 12,start = c(2007, 7))
预期:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2007 91.59784 105.61894 62.93516 75.52850 81.42555 65.12712
2008 60.00000 87.00000 193.65425 83.64144 62.93557 99.92567 113.00000 66.54835 82.20302 122.12742 96.66115 69.00000
2009 43.64738 86.34233 130.63708 57.88069 75.00000 63.00224 96.55052 76.00000 41.00000 99.00000 87.82694 59.69254
2010 71.00000 101.00000 135.14169 95.43113 131.27026 78.98095 75.11503
谢谢!
答案 0 :(得分:1)
将您的年和月列设置为索引(multiindex),然后再将其堆叠。
df.set_index(['Year', 'Month.of.absence']).unstack()
答案 1 :(得分:1)
做这样的事情:
import calendar
month_names= list(calendar.month_name) # list of all the month names
df['Month.of.absence'] = df['Month.of.absence'].apply(lambda m: month_names[m]) # convert to names
df = df.pivot(index='Year', columns='Month.of.absence') # pivot it