import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
wine = "~/Desktop/datasets/winequality-white.csv"
# Load the data
df = pd.read_csv(wine,sep=";")
df.head()
# Look at the information regarding its columns.
df.info()
# non-null floats also validated by √null_release_mask = df['fixed
acidity'].isnull()
我正在尝试进行火车测试,并选择3个预测变量来预测质量
from sklearn.model_selection import train_test_split
X = df[["alcohol", "pH","free sulfur dioxide"]]
y = df["quality"]
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=42)
print(len(X_train), len(X_test))
print(len(y_train), len(y_test))`
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train,y_train)
import numpy as np
x_values_to_plot = np.linspace(0, df[["alcohol", "pH","free sulfur
dioxide"]].max(), 15)
y_values_to_plot = (x_values_to_plot * model.coef_) + model.intercept_
fig, ax = plt.subplots(figsize=(6,6))
ax.scatter(df[["alcohol", "pH","free sulfur dioxide"]], df["quality"],
label="data", alpha=0.2)
ax.plot(x_values_to_plot, y_values_to_plot, label="regression_line of
white wines", c="r")
ax.legend(loc="best")
plt.show()
但是我收到此错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call
last)
<ipython-input-68-c52d735932ab> in <module>()
1 import numpy as np
2
----> 3 x_values_to_plot = np.linspace(0, df[["alcohol", "pH","free
sulfur dioxide"]].max(), 15)
4 y_values_to_plot = (x_values_to_plot * model.coef_) +
model.intercept_
5
~/anaconda3/lib/python3.7/site-packages/numpy/core/function_base.py in
linspace(start, stop, num, endpoint, retstep, dtype)
122 if num > 1:
123 step = delta / div
--> 124 if step == 0:
125 # Special handling for denormal numbers, gh-5437
126 y /= div
*ValueError: The truth value of an array with more than one element
is
ambiguous. Use a.any() or a.all()*
任何帮助将不胜感激,我是StackOverflow的新手,所以请留心问题的格式,并让我知道我可以改进的地方。谢谢
答案 0 :(得分:0)
此特定错误与此代码段有关
x_values_to_plot = np.linspace(0, df[["alcohol", "pH","free sulfur dioxide"]].max(), 15)
自
df[["alcohol", "pH","free sulfur dioxide"]].max()
将返回三个值,酒精,pH和游离SO2的最大值。您可以通过添加另一个.max()来解决此问题,假设这是您要尝试的操作,它将选择这三个最大值中的最大值。
回归模型下面的部分还存在其他一些问题。您到底想表达什么?您总是可以尝试使用seaborn,这对于这些类型的可视化很有用。