用于循环和线性回归

时间:2018-03-23 15:25:40

标签: python-3.x pandas for-loop linear-regression

晚上好,

我想在同一数据框架上重申子集化和线性回归。

#I get the unique codes of the articles
codes = np.unique(data["cod_id"])

#Split
X = data['price']
y = data["quantity"]

accuracy = []
for i in np.nditer(codes):
    data = data.loc[df["cod_id"] == i]

#Arrange an if statement to avoid 0-element arrays, while splitting (80% train, 20% test)

    if int(len(data)) <= 2:

        X_train = X 
        y_train = y  

        # Test dataset 
        X_test = X 
        y_test = y 
    else:
        t = 0.8
        t = int(t*len(data)) 

        #Split     
        t = int(t*len(data)) 
        # Train dataset 
        X_train = X[:t] 
        y_train = y[:t]  

        # Test dataset 
        X_test = X[t:] 
        y_test = y[t:]

    #Run the Algorithm
    lr = linear_model.LinearRegression()
    lr.fit(X_train, y_train)

    predicted_test_tr = lr.predict(X_test)

    pred_cost = (X_test["price"] * predicted_test_tr).sum()
    real_cost = (X_test["price"] * y_test).sum()

    delta = (pred_cost - owner_cost)/owner_cost 
    accuracy.append(delta)

但它会报告一个列表&#34;准确度&#34;,只要&#34;代码&#34;一个,但在每个位置具有相同的值

print(accuracy)

5.43234
5.43234
5.43234
...

如何解决此问题? 谢谢

0 个答案:

没有答案