Question

我有一个如下所示的数据框，

Year            Month.of.absence          Absenteeism.time.in.hours
1  2007                7                  91.59784
2  2007                8                 105.61894
3  2007                9                  62.93516
4  2007               10                  75.52850
5  2007               11                  81.42555
6  2007               12                  65.12712
7  2008                1                  60.00000
8  2008                2                  87.00000
9  2008                3                 193.65425
10 2008                4                  83.64144
11 2008                5                  62.93557
12 2008                6                  99.92567
13 2008                7                 113.00000
14 2008                8                  66.54835
15 2008                9                  82.20302
16 2008               10                 122.12742
17 2008               11                  96.66115
18 2008               12                  69.00000
19 2009                1                  43.64738
20 2009                2                  86.34233
21 2009                3                 130.63708
22 2009                4                  57.88069
23 2009                5                  75.00000
24 2009                6                  63.00224
25 2009                7                  96.55052
26 2009                8                  76.00000
27 2009                9                  41.00000
28 2009               10                  99.00000
29 2009               11                  87.82694
30 2009               12                  59.69254
31 2010                1                  71.00000
32 2010                2                 101.00000
33 2010                3                 135.14169
34 2010                4                  95.43113
35 2010                5                 131.27026
36 2010                6                  78.98095
37 2010                7                  75.11503

我想将其从python转换为从2007年7月到2010年7月的时间序列。这是R代码，在python中等同于

ts_complete_data = ts(aggre.absent.hours.months$Absenteeism.time.in.hours, frequency = 12,start = c(2007, 7))

预期：

           Jan       Feb       Mar       Apr       May       Jun       Jul       Aug       Sep       Oct       Nov       Dec
2007                                                              91.59784 105.61894  62.93516  75.52850  81.42555  65.12712
2008  60.00000  87.00000 193.65425  83.64144  62.93557  99.92567 113.00000  66.54835  82.20302 122.12742  96.66115  69.00000
2009  43.64738  86.34233 130.63708  57.88069  75.00000  63.00224  96.55052  76.00000  41.00000  99.00000  87.82694  59.69254
2010  71.00000 101.00000 135.14169  95.43113 131.27026  78.98095  75.11503

谢谢！

Answer 1

将您的年和月列设置为索引（multiindex），然后再将其堆叠。

df.set_index(['Year', 'Month.of.absence']).unstack()

Answer 2

做这样的事情：

import calendar

month_names=   list(calendar.month_name)  #  list of all the month names
df['Month.of.absence'] = df['Month.of.absence'].apply(lambda m: month_names[m])  # convert to names

df = df.pivot(index='Year', columns='Month.of.absence')  # pivot it

如何在python中从月中旬将数据帧转换为每月时间序列？

2 个答案: