线性回归Python中的值错误-数组长度与索引长度不匹配

时间:2019-12-30 11:54:37

标签: python

我有一个在控制台上打印时看起来像这样的python数据框(称为df):

public class DiscountProduct extends Product {

    private double discountRate;

    public DiscountProduct(String productCode, String description, int unitPrice, double discountRate) {
        super(productCode, description, unitPrice);
        this.discountRate = discountRate;
    }

    //equals method to test discount product for equality.
    public boolean equals(Object obj){
        Order other = (Order) obj;
       //how to test for the equality for the discountProduct's field discountRate?
       //does I need to add some method in Order class to get the discountRate of object of Order class, 
       //because relation is 'Order has Product' and then there is a parent-child relation
       //between product and DiscountProduct class. 

    }
}

我正在通过机器学习器运行它,但是在尝试打印预测值时出现错误。这是我的代码:

 date                      2019-09-03 00:00:00  ...  OverallAtt
    students                                       ...            
    5c48943cbe8e95292564e163                  0.0  ...   78.321678
    5c48943dbe8e95292564e165                100.0  ...   87.500000
    5c48943dbe8e95292564e166                100.0  ...   86.713287
    5c48943dbe8e95292564e167                100.0  ...   95.804196
    5c48943dbe8e95292564e169                100.0  ...  100.000000
    5c48943dbe8e95292564e16b                100.0  ...   98.601399
    5c48943dbe8e95292564e16d                100.0  ...   85.314685
    5c48943dbe8e95292564e173                100.0  ...   96.503497
    5c48943dbe8e95292564e175                100.0  ...   83.216783

我收到此错误:

dataset = df
X = dataset
X = X.drop(['OverallAtt'], axis=1)
X = pd.DataFrame(X).fillna(0)
y = dataset['OverallAtt'] #Total Attendance ThisYear

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)

import pickle
filename='Regressor_model.sav'
pickle.dump(regressor, open(filename, 'wb'))

load_lr_model =pickle.load(open(filename, 'rb'))

#PREDICT FROM NEW DATA
dataset = df
X = dataset
X = X.drop(['OverallAtt'], axis=1)
X = pd.DataFrame(X).fillna(0)
ActualAttendance = dataset['OverallAtt']
Names = df.reset_index(drop=False)['students']

NewX_test = (X)
y_load_predit=load_lr_model.predict(NewX_test)
Newdf = pd.DataFrame({'Full Name': Names, 'Actual Attendance': ActualAttendance, 'Predicted Attendance': y_load_predit})
print(Newdf)

ActualAttendance和Names均为382。Y_load_predit也是382的数组。所以不确定我为什么会收到此错误?

0 个答案:

没有答案