将值列表转换为python中的时间序列

时间:2016-06-18 14:54:38

标签: python datetime numpy pandas time-series

我想转换foll。数据:

mapStateToProps

到长度为jan_1 jan_15 feb_1 feb_15 mar_1 mar_15 apr_1 apr_15 may_1 may_15 jun_1 jun_15 jul_1 jul_15 aug_1 aug_15 sep_1 sep_15 oct_1 oct_15 nov_1 nov_15 dec_1 dec_15 0 0 0 0 0 1 1 2 2 2 2 2 2 3 3 3 3 3 0 0 0 0 0 0 的数组中,其中每个元素重复到下一个日期,例如从365january 1 ...

重复0

我可以执行类似january 15的操作,但这不是日期识别,因此不会考虑numpy.repeatfeb_15之间发生的时间少于15天。

任何pythonic解决方案?

2 个答案:

答案 0 :(得分:2)

IIUC你可以这样做:

In [194]: %paste
# transpose DF, rename columns
x = df.T.reset_index().rename(columns={'index':'date', 0:'val'})
# parse dates
x['date'] = pd.to_datetime(x['date'], format='%b_%d')
# group resampled DF by the month and resample(`D`) each group 
result = (x.groupby(x['date'].dt.month)
           .apply(lambda x: x.set_index('date').resample('1D').ffill()))
# rename index names
result.index.names = ['month','date']
## -- End pasted text --

In [212]: result
Out[212]:
                  val
month date
1     1900-01-01    0
      1900-01-02    0
      1900-01-03    0
      1900-01-04    0
      1900-01-05    0
      1900-01-06    0
      1900-01-07    0
      1900-01-08    0
      1900-01-09    0
      1900-01-10    0
      1900-01-11    0
      1900-01-12    0
      1900-01-13    0
      1900-01-14    0
      1900-01-15    0
2     1900-02-01    0
      1900-02-02    0
      1900-02-03    0
      1900-02-04    0
      1900-02-05    0
      1900-02-06    0
      1900-02-07    0
      1900-02-08    0
      1900-02-09    0
      1900-02-10    0
      1900-02-11    0
      1900-02-12    0
      1900-02-13    0
      1900-02-14    0
      1900-02-15    0
...               ...
11    1900-11-01    0
      1900-11-02    0
      1900-11-03    0
      1900-11-04    0
      1900-11-05    0
      1900-11-06    0
      1900-11-07    0
      1900-11-08    0
      1900-11-09    0
      1900-11-10    0
      1900-11-11    0
      1900-11-12    0
      1900-11-13    0
      1900-11-14    0
      1900-11-15    0
12    1900-12-01    0
      1900-12-02    0
      1900-12-03    0
      1900-12-04    0
      1900-12-05    0
      1900-12-06    0
      1900-12-07    0
      1900-12-08    0
      1900-12-09    0
      1900-12-10    0
      1900-12-11    0
      1900-12-12    0
      1900-12-13    0
      1900-12-14    0
      1900-12-15    0

[180 rows x 1 columns]

或使用reset_index()

In [213]: result.reset_index().head(20)
Out[213]:
    month       date  val
0       1 1900-01-01    0
1       1 1900-01-02    0
2       1 1900-01-03    0
3       1 1900-01-04    0
4       1 1900-01-05    0
5       1 1900-01-06    0
6       1 1900-01-07    0
7       1 1900-01-08    0
8       1 1900-01-09    0
9       1 1900-01-10    0
10      1 1900-01-11    0
11      1 1900-01-12    0
12      1 1900-01-13    0
13      1 1900-01-14    0
14      1 1900-01-15    0
15      2 1900-02-01    0
16      2 1900-02-02    0
17      2 1900-02-03    0
18      2 1900-02-04    0
19      2 1900-02-05    0

答案 1 :(得分:1)

您可以使用resample

#add last value - 31 dec by value of last column of df 
df['dec_31'] = df.iloc[:,-1]

#convert to datetime - see http://strftime.org/
df.columns = pd.to_datetime(df.columns, format='%b_%d')

#transpose and resample by days
df1 = df.T.resample('d').ffill()
df1.columns = ['col']
print (df1)
          col  
1900-01-01  0
1900-01-02  0
1900-01-03  0
1900-01-04  0
1900-01-05  0
1900-01-06  0
1900-01-07  0
1900-01-08  0
1900-01-09  0
1900-01-10  0
1900-01-11  0
1900-01-12  0
1900-01-13  0
1900-01-14  0
1900-01-15  0
1900-01-16  0
1900-01-17  0
1900-01-18  0
1900-01-19  0
1900-01-20  0
1900-01-21  0
1900-01-22  0
1900-01-23  0
1900-01-24  0
1900-01-25  0
1900-01-26  0
1900-01-27  0
1900-01-28  0
1900-01-29  0
1900-01-30  0
       ..
1900-12-02  0
1900-12-03  0
1900-12-04  0
1900-12-05  0
1900-12-06  0
1900-12-07  0
1900-12-08  0
1900-12-09  0
1900-12-10  0
1900-12-11  0
1900-12-12  0
1900-12-13  0
1900-12-14  0
1900-12-15  0
1900-12-16  0
1900-12-17  0
1900-12-18  0
1900-12-19  0
1900-12-20  0
1900-12-21  0
1900-12-22  0
1900-12-23  0
1900-12-24  0
1900-12-25  0
1900-12-26  0
1900-12-27  0
1900-12-28  0
1900-12-29  0
1900-12-30  0
1900-12-31  0

[365 rows x 1 columns]
#if need serie
print (df1.col)
1900-01-01    0
1900-01-02    0
1900-01-03    0
1900-01-04    0
1900-01-05    0
1900-01-06    0
1900-01-07    0
1900-01-08    0
1900-01-09    0
1900-01-10    0
1900-01-11    0
1900-01-12    0
1900-01-13    0
1900-01-14    0
1900-01-15    0
1900-01-16    0
1900-01-17    0
1900-01-18    0
1900-01-19    0
1900-01-20    0
1900-01-21    0
1900-01-22    0
1900-01-23    0
1900-01-24    0
1900-01-25    0
1900-01-26    0
1900-01-27    0
1900-01-28    0
1900-01-29    0
1900-01-30    0
             ..
1900-12-02    0
1900-12-03    0
1900-12-04    0
1900-12-05    0
1900-12-06    0
1900-12-07    0
1900-12-08    0
1900-12-09    0
1900-12-10    0
1900-12-11    0
1900-12-12    0
1900-12-13    0
1900-12-14    0
1900-12-15    0
1900-12-16    0
1900-12-17    0
1900-12-18    0
1900-12-19    0
1900-12-20    0
1900-12-21    0
1900-12-22    0
1900-12-23    0
1900-12-24    0
1900-12-25    0
1900-12-26    0
1900-12-27    0
1900-12-28    0
1900-12-29    0
1900-12-30    0
1900-12-31    0
Freq: D, Name: col, dtype: int64
#transpose and convert to numpy array
print (df1.T.values)
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2
  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
  2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
  3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
  3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]