如何解析sklearns线性回归的日期值?

时间:2018-02-15 14:43:39

标签: python pandas numpy scikit-learn linear-regression

我正在使用以下Pandas DataFrame index = groupedCrimes.index:

DatetimeIndex(['2014-06-30', '2014-07-31', '2014-08-31', '2014-09-30',
               '2014-10-31', '2014-11-30', '2014-12-31', '2015-01-31',
               '2015-02-28', '2015-03-31', '2015-04-30', '2015-05-31',
               '2015-06-30', '2015-07-31', '2015-08-31', '2015-09-30',
               '2015-10-31', '2015-11-30', '2015-12-31', '2016-01-31',
               '2016-02-29', '2016-03-31', '2016-04-30', '2016-05-31',
               '2016-06-30', '2016-07-31', '2016-08-31', '2016-09-30',
               '2016-10-31', '2016-11-30', '2016-12-31', '2017-01-31',
               '2017-02-28', '2017-03-31', '2017-04-30', '2017-05-31'],
              dtype='datetime64[ns]', name='Month', freq='M')

我正在从datetime64 [ns]转换它的类型,所以我可以使用sklearns线性回归。

#I change the dates to be integers, I am not sure this is the best way    
groupedCrimes.index = pd.to_datetime(groupedCrimes.index)  
groupedCrimes.index = (groupedCrimes.index - groupedCrimes.index.min())  / np.timedelta64(1,'D')

将其转换为以下内容:

[[0.00000000e+00]
 [3.58796296e-13]
 [7.17592593e-13]
 [1.06481481e-12]
 [1.42361111e-12]
 [1.77083333e-12]
 [2.12962963e-12]
 [2.48842593e-12]
 [2.81250000e-12]
 [3.17129630e-12]
 [3.51851852e-12]
 [3.87731481e-12]
 [4.22453704e-12]
 [4.58333333e-12]
 [4.94212963e-12]
 [5.28935185e-12]
 [5.64814815e-12]
 [5.99537037e-12]
 [6.35416667e-12]
 [6.71296296e-12]
 [7.04861111e-12]
 [7.40740741e-12]
 [7.75462963e-12]
 [8.11342593e-12]
 [8.46064815e-12]
 [8.81944444e-12]
 [9.17824074e-12]
 [9.52546296e-12]
 [9.88425926e-12]
 [1.02314815e-11]
 [1.05902778e-11]
 [1.09490741e-11]
 [1.12731481e-11]
 [1.16319444e-11]
 [1.19791667e-11]
 [1.23379630e-11]]

然后例如我可以将其中一个值预测为日期:

[in] model.predict(3.58796296e-13)
[out] array([5990.81354452])

我怎么能:

  1. A)将这些数字转换回日期,以便知道我是哪个日期 预测。
  2. B)将日期转换为此格式,以便我可以预测 未来的日期?
  3. 我有更好的方法来转换和处理日期吗?

1 个答案:

答案 0 :(得分:2)

如何简单地将日期时间转换为1970-01-01后的#天数?

In [386]: df
Out[386]:
                 val
2014-06-30  0.156202
2014-07-31  0.416251
2014-08-31  0.649295
2014-09-30  0.402265
2014-10-31  0.983870
2014-11-30  0.773942
2014-12-31  0.327271
2015-01-31  0.813580
2015-02-28  0.292830
2015-03-31  0.848269
...              ...
2016-08-31  0.595301
2016-09-30  0.171903
2016-10-31  0.355610
2016-11-30  0.477474
2016-12-31  0.517182
2017-01-31  0.891583
2017-02-28  0.591066
2017-03-31  0.799293
2017-04-30  0.225473
2017-05-31  0.444644

[36 rows x 1 columns]

In [387]: df.index = (df.index - pd.to_datetime('1970-01-01')).days

In [388]: df
Out[388]:
            val
16251  0.156202
16282  0.416251
16313  0.649295
16343  0.402265
16374  0.983870
16404  0.773942
16435  0.327271
16466  0.813580
16494  0.292830
16525  0.848269
...         ...
17044  0.595301
17074  0.171903
17105  0.355610
17135  0.477474
17166  0.517182
17197  0.891583
17225  0.591066
17256  0.799293
17286  0.225473
17317  0.444644

[36 rows x 1 columns]

将其转换回来:

In [392]: pd.to_datetime(df.index, unit='D')
Out[392]:
DatetimeIndex(['2014-06-30', '2014-07-31', '2014-08-31', '2014-09-30', '2014-10-31', '2014-11-30', '2014-12-31',
               '2015-01-31', '2015-02-28', '2015-03-31', '2015-04-30', '2015-05-31', '2015-06-30', '2015-07-31',
               '2015-08-31', '2015-09-30', '2015-10-31', '2015-11-30', '2015-12-31', '2016-01-31', '2016-02-29',
               '2016-03-31', '2016-04-30', '2016-05-31', '2016-06-30', '2016-07-31', '2016-08-31', '2016-09-30',
               '2016-10-31', '2016-11-30', '2016-12-31', '2017-01-31', '2017-02-28', '2017-03-31', '2017-04-30',
               '2017-05-31'],
              dtype='datetime64[ns]', freq=None)