我目前正在努力绘制线性回归输出。我发现了类似的问题,建议确保数据类型设置为int。我确保将其纳入我的代码中。
我已多次浏览代码,结构对我来说似乎很合理。我愿意接受任何反馈!非常感谢你的帮助!
请注意,列(Accident_Severity和Number_of_Casualties)只是数字。 (即事故的严重程度为3,涉及1人伤亡)。
-------------------第一步-------------------
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
%pylab inline
import matplotlib.pyplot as plt
-------------------第二步-------------------
raw_data = pd.read_csv("/Users/Maddco12/Desktop/1-6m-accidents-traffic-flow-over-16-years/accidents_2005_to_2007.csv")
dtype={'Number_of_Casualties': int,'Accident_Severity': int}
raw_data.head(4)
-------------------步骤3 -------------------
filtered_data = raw_data[~np.isnan(raw_data["Accident_Severity"])] #removes rows with NaN in them
filtered_data.head(4)
filtered_data = raw_data[~np.isnan(raw_data["Number_of_Casualties"])] #removes rows with NaN in them
filtered_data.head(4)
-------------------第四步-------------------
npMatrix = np.matrix(filtered_data)
X, Y = npMatrix[:,0], npMatrix[:,1]
mdl = LinearRegression().fit(filtered_data[['Number_of_Casualties']],
filtered_data.Accident_Severity)
m = mdl.coef_[0]
b = mdl.intercept_
print "formula: y = {0}x + {1}".format(m, b)
------------------- Step5 -------------------(我在这里得到了值错误)
plt.scatter(X,Y, color='blue')
plt.plot([0,100],[b,m*100+b],'r')
plt.title('Linear Regression Example', fontsize = 20)
plt.xlabel('Number of Casualties', fontsize = 15)
plt.ylabel('Accident Severity', fontsize = 15)
plt.show()
错误如下---->
ValueError Traceback (most recent call last)
<ipython-input-10-5bf84a35de3d> in <module>()
----> 1 plt.scatter(X,Y, color='blue')
2 plt.plot([0,100],[b,m*100+b],'r')
3 plt.title('Linear Regression Example', fontsize = 20)
4 plt.xlabel('Number of Casualties', fontsize = 15)
5 plt.ylabel('Accident Severity', fontsize = 15)
/Users/Maddco12/Documents/Python/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, hold, data, **kwargs)
3256 vmin=vmin, vmax=vmax, alpha=alpha,
3257 linewidths=linewidths, verts=verts,
-> 3258 edgecolors=edgecolors, data=data, **kwargs)
3259 finally:
3260 ax.hold(washold)
/Users/Maddco12/Documents/Python/anaconda/lib/python2.7/site-packages/matplotlib/__init__.pyc in inner(ax, *args, **kwargs)
1817 warnings.warn(msg % (label_namer, func.__name__),
1818 RuntimeWarning, stacklevel=2)
-> 1819 return func(ax, *args, **kwargs)
1820 pre_doc = inner.__doc__
1821 if pre_doc is None:
/Users/Maddco12/Documents/Python/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
3836
3837 # c will be unchanged unless it is the same length as x:
-> 3838 x, y, s, c = cbook.delete_masked_points(x, y, s, c)
3839
3840 scales = s # Renamed for readability below.
/Users/Maddco12/Documents/Python/anaconda/lib/python2.7/site-packages/matplotlib/cbook.pyc in delete_masked_points(*args)
1846 return ()
1847 if (is_string_like(args[0]) or not iterable(args[0])):
-> 1848 raise ValueError("First argument must be a sequence")
1849 nrecs = len(args[0])
1850 margs = []
ValueError: First argument must be a sequence.
答案 0 :(得分:1)
我建议在绘制它们之前检查X和Y值。其余的代码看起来很直接,所以很可能问题就在那里。
Scatter图期望X和Y的值数组
https://matplotlib.org/api/_as_gen/matplotlib.pyplot.scatter.html
试一下,看看它是否有效
plt.scatter([X],[Y], color='blue')
答案 1 :(得分:1)
也许您应该检查csv文件。如果使用旧的Excel版本生成它,则可能会出现这种错误。通过将csv加载到Googlespreadsheets,然后将其再次导出为(更好的)csv文件,我解决了这个问题。某些csv文件类型和某些版本的python似乎存在怪异的不兼容性。在这里,您将对这个问题进行有价值的讨论:Excel to CSV with UTF8 encoding。希望能帮助到你。