为什么我的y_pred模型只接近于零?

时间:2018-04-09 17:13:21

标签: python-3.x pandas scikit-learn sklearn-pandas

我是python的新手,也是学习机器学习的人。我得到了一个泰坦尼克号的数据集,并试图预测谁幸存下来,谁没有幸存。但我的代码似乎与y_pred存在问题,因为它们都不接近1或高于1。查找附件y_testy_pred图片。

    # Importing the libraries
    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd

    # Importing the dataset
    dataset = pd.read_csv('train.csv')
    X = dataset.iloc[:, :-1].values
    y = dataset.iloc[:, 3].values

    # Taking care of missing data
    from sklearn.preprocessing import Imputer
    imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
    imputer = imputer.fit(X[:, 2:3])
    X[:, 2:3] = imputer.transform(X[:, 2:3])

    #Encoding Categorical variable
    from sklearn.preprocessing import LabelEncoder, OneHotEncoder
    labelencoder_X = LabelEncoder()
    X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
    onehotencoder = OneHotEncoder(categorical_features = [0])
    X = onehotencoder.fit_transform(X).toarray()

    # Dummy variable trap
    X = X[:, 1:]

    # Splitting the Dataset into Training Set and Test Set
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

    # Split the dataset into training and test set
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_tratin, y_test = train_test_split(X, y, test_size = 0.2,)

    # Fitting the Multiple Linear Regression to the training set
    """ regressor is an object of LinearRegression() class in line 36 """
    from sklearn.linear_model import LinearRegression
    regressor = LinearRegression()
    regressor.fit(X_train, y_train)

y_pred

1 个答案:

答案 0 :(得分:0)

感谢大家的帮助,我已经能够解决它了。 问题是导入数据集中的y被视为向量而不是矩阵

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('train.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3:].values

# Taking care of missing data
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 2:3])
X[:, 2:3] = imputer.transform(X[:, 2:3])

#Encoding Categorical variable
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()

# Dummy variable trap
X = X[:, 1:]

# Splitting the Dataset into Training Set and Test Set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

# Fitting the Multiple Linear Regression to the training set
""" regressor is an object of LinearRegression() class in line 36 """
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Predicting the test set result
y_pred = regressor.predict(X_test)