我将数据集划分为天气条件相似的不同时区。此代码在数据集上的上午10点至下午4点运行几天。虽然使用了多元线性回归方法,但MSE确实很高。
以下是原始数据集的链接:https://www.kaggle.com/dronio/SolarEnergy#SolarPrediction.csv
链接到上午10点至下午4点建立的数据集:https://drive.google.com/file/d/1-QHBH70Fj6XhjzcrbKZ6XLrABacbdR3u/view?usp=sharing
#Solar Generation Prediction
#Importing librarires
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing data sets
dataset = pd.read_csv('Dataset Days 10AM-4PM.csv')
'''dataset.dropna()'''
X = dataset.iloc[:, 4:9].values
'''X = pd.DataFrame(Xdf)'''
Y = dataset.iloc[:, 3:4].values
'''Y = pd.DataFrame(Ydf)'''
#Splitting dataset into training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, shuffle = False, test_size = 0.2, random_state = 0)
#Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
Y_train = sc_X.fit_transform(Y_train)
Y_test = sc_X.transform(Y_test)
#Fitting Simple Linear Regression into Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, Y_train)
#Predicting the test set results
Y_predML = regressor.predict(X_test)