Question

我有两个问题，我相信它们都会发布到日期格式。

我有一个日期和值的简历：

2012-01-03 00:00:00     95812    
2012-01-04 00:00:00    101265 
... 
2016-10-21 00:00:00     93594

我用read_csv加载后我试图用以下语句解析日期：

X.Dated = pd.to_datetime(X.Dated, format='%Y-%m-%d %H:%M:%S', errors='raise')

我也尝试过：

dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
X = pd.read_csv('sales.csv', parse_dates=['Dated'], date_parser=dateparse)

和infer_datetime_format参数。

所有这些似乎都运行正常，因为当我打印出来时，日期看起来像是：2012-01-03。

当我尝试在图表上绘制数据时出现问题，这一行：

ax.scatter(X.Dated, X.Val, c='green', marker='.')

给了我一个错误：

TypeError: invalid type promotion

当我尝试将其与LinearRegression（）算法一起使用时，使用fit命令工作正常，但得分和预测给了我这个错误：

TypeError: Cannot cast array data from dtype('<M8[ns]') to dtype('float64') according to the rule 'safe'

我尝试了很多东西来解决它，但没有运气。任何帮助，将不胜感激。

Answer 1

ax.scatter（目前）不接受Pandas系列，但它可以接受Pandas时间戳列表（例如X['Dated'].tolist()）或dtype datetime64[ns]的NumPy数组（例如{ {1}}）：

X['Dated'].values

Under the hood, the ax.scatter method calls

import pandas as pd
import matplotlib.pyplot as plt

X = pd.DataFrame({'Dated': [pd.Timestamp('2012-01-03 00:00:00'),
                            pd.Timestamp('2012-01-04 00:00:00'),
                            pd.Timestamp('2016-10-21 00:00:00')],
                  'Val': [95812, 101265, 93594]})

fig, ax = plt.subplots()
# ax.scatter(X['Dated'].tolist(), X['Val'], c='green', marker='.', s=200)
ax.scatter(X['Dated'].values, X['Val'], c='green', marker='.', s=200)
plt.show()

处理类似日期的输入。 x = self.convert_xunits(x) y = self.convert_yunits(y)将NumPy datetime64数组转换为Matplotlib datenums，但它将Pandas时间序列转换为NumPy datetime64数组。

因此，当Pandas时间序列作为convert_xunits的输入传递时，代码在this line is reached时失败：

ax.scatter

offsets = np.dstack((x, y))尝试将其输入的dtypes推广到一个常见的dtype。如果np.dstack dtype x且datetime64[ns] dtype y，那么

float64

由于没有与两者兼容的本机NumPy dtype，因此引发了

。

具有scatter和LinearRegression的日期问题

1 个答案: