转置数据框列,每天创建不同的行

时间:2017-07-09 00:14:08

标签: python pandas dataframe

我有一个dataframe,其中有一列和一个时间戳索引,包括2到7天的任何时间:

                         kWh
Timestamp                   
2017-07-08 06:00:00     0.00
2017-07-08 07:00:00   752.75
2017-07-08 08:00:00  1390.20
2017-07-08 09:00:00  2027.65
2017-07-08 10:00:00  2447.27    
        ....           ....
2017-07-12 20:00:00   167.64
2017-07-12 21:00:00     0.00
2017-07-12 22:00:00     0.00
2017-07-12 23:00:00     0.00

我想转置kWh列,以便一天的价值(每小时粒度,因此24个值/天)填满一行。下一行是值的第二天,依此类推(因此,预测数据的五天有五行,每行有24个元素)。

因为我对数据的查询是以垂直格式进行的,并且我的回归和后续分析已经以垂直格式进行,所以我不想过多地改变过程,并希望有一种更简单的方法。我尝试使用df.index.hour提供多索引,然后使用unstack(),但我得到的dataframeNaN值很高。

有优雅的方法吗?

2 个答案:

答案 0 :(得分:2)

如果我们从像

这样的框架开始
In [25]: df = pd.DataFrame({"kWh": [1]}, index=pd.date_range("2017-07-08", 
             "2017-07-12", freq="1H").rename("Timestamp")).cumsum()

In [26]: df.head()
Out[26]: 
                     kWh
Timestamp               
2017-07-08 00:00:00    1
2017-07-08 01:00:00    2
2017-07-08 02:00:00    3
2017-07-08 03:00:00    4
2017-07-08 04:00:00    5

我们可以创建日期和小时列,然后转动:

In [27]: df["date"] = df.index.date

In [28]: df["hour"] = df.index.hour

In [29]: df.pivot(index="date", columns="hour", values="kWh")
Out[29]: 
hour          0     1     2     3     4     5     6     7     8     9   ...   \
date                                                                    ...    
2017-07-08   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  ...    
2017-07-09  25.0  26.0  27.0  28.0  29.0  30.0  31.0  32.0  33.0  34.0  ...    
2017-07-10  49.0  50.0  51.0  52.0  53.0  54.0  55.0  56.0  57.0  58.0  ...    
2017-07-11  73.0  74.0  75.0  76.0  77.0  78.0  79.0  80.0  81.0  82.0  ...    
2017-07-12  97.0   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...    

hour          14    15    16    17    18    19    20    21    22    23  
date                                                                    
2017-07-08  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  
2017-07-09  39.0  40.0  41.0  42.0  43.0  44.0  45.0  46.0  47.0  48.0  
2017-07-10  63.0  64.0  65.0  66.0  67.0  68.0  69.0  70.0  71.0  72.0  
2017-07-11  87.0  88.0  89.0  90.0  91.0  92.0  93.0  94.0  95.0  96.0  
2017-07-12   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  

[5 rows x 24 columns]

答案 1 :(得分:0)

不确定为什么MultiIndex代码不起作用。 我假设你的MultiIndex代码是沿线的,它提供与pivot相同的输出:

In []
df = pd.DataFrame({"kWh": [1]}, index=pd.date_range("2017-07-08", 
                   "2017-07-12", freq="1H").rename("Timestamp")).cumsum()
df.index = pd.MultiIndex.from_arrays([df.index.date, df.index.hour], names=['Date','Hour'])
df.unstack()

Out[]:
             kWh                                                        ...   \
Hour          0     1     2     3     4     5     6     7     8     9   ...    
Date                                                                    ...    
2017-07-08   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  ...    
2017-07-09  25.0  26.0  27.0  28.0  29.0  30.0  31.0  32.0  33.0  34.0  ...    
2017-07-10  49.0  50.0  51.0  52.0  53.0  54.0  55.0  56.0  57.0  58.0  ...    
2017-07-11  73.0  74.0  75.0  76.0  77.0  78.0  79.0  80.0  81.0  82.0  ...    
2017-07-12  97.0   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...    


Hour          14    15    16    17    18    19    20    21    22    23  
Date                                                                    
2017-07-08  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  
2017-07-09  39.0  40.0  41.0  42.0  43.0  44.0  45.0  46.0  47.0  48.0  
2017-07-10  63.0  64.0  65.0  66.0  67.0  68.0  69.0  70.0  71.0  72.0  
2017-07-11  87.0  88.0  89.0  90.0  91.0  92.0  93.0  94.0  95.0  96.0  
2017-07-12   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  

[5 rows x 24 columns]