我正在练习线性回归,在这里我将日期作为输入 x 并期望输出 y(float)
x = df[('Date')].values
x = x.reshape(-1, 1)
y= df[('MeanTemp')].values #MeanTemp column has float values
y = y.reshape(-1, 1)
当我打印 x 时,输出是:
array([['1942-07-01T00:00:00.000000000'],
['1942-07-02T00:00:00.000000000'],
['1942-07-03T00:00:00.000000000'],
['1942-07-04T00:00:00.000000000'],
['1942-07-05T00:00:00.000000000'],
['1942-07-06T00:00:00.000000000'],
['1942-07-07T00:00:00.000000000'],
['1942-07-08T00:00:00.000000000'],
['1942-07-09T00:00:00.000000000'],
['1942-07-10T00:00:00.000000000']], dtype='datetime64[ns]')
现在,当我使用线性回归时
linlin = LinearRegression()
linlin.fit(x, y)
它没有给出任何错误,但是当我写
linlin.predict(x)
TypeError: The DTypes <class 'numpy.dtype[float64]'> and <class 'numpy.dtype[datetime64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.
弹出上面的TypeError。如何将此数据类型转换为浮点数,以便预测函数正常工作?
答案 0 :(得分:1)
您可以使用从 numpy
开始的日期的 timedelta
天数与 min
日期相比,如下所示:
>>> import numpy as np
>>> df['date_delta'] = (df['Date'] - df['Date'].min()) / np.timedelta64(1,'D')
>>> x = df['date_delta'].values
或者您可以使用以下函数以浮点表示形式转换日期:
>>> import numpy as np
>>> import pandas as pd
>>> def dt64_to_float(dt64):
... year = dt64.astype('M8[Y]')
... days = (dt64 - year).astype('timedelta64[D]')
... year_next = year + np.timedelta64(1, 'Y')
... days_of_year = (year_next.astype('M8[D]') - year.astype('M8[D]')).astype('timedelta64[D]')
... dt_float = 1970 + year.astype(float) + days / (days_of_year)
... return dt_float
>>> df['date_float'] = dt64_to_float(df['Date'].to_numpy())
>>> x = df['date_float'].values