我正在尝试这个NCAA篮球预测程序,但不断收到此错误:
Traceback (most recent call last):
File "/mnt/chromeos/removable/JACKS JUNK/Chatbot_2/sports_predict.py", line 17, in <module>
X_train, X_test, y_train, y_test = train_test_split(X, y)
File "/home/jackmdavis06/.local/lib/python3.5/site-packages/sklearn/model_selection/_split.py", line 2116, in train_test_split
arrays = indexable(*arrays)
File "/home/jackmdavis06/.local/lib/python3.5/site-packages/sklearn/utils/validation.py", line 237, in indexable
check_consistent_length(*result)
File "/home/jackmdavis06/.local/lib/python3.5/site-packages/sklearn/utils/validation.py", line 212, in check_consistent_length
" samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [2258, 4148]
这是我的代码:
import pandas as pd
from sportsreference.ncaab.teams import Teams
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
FIELDS_TO_DROP = ['away_points', 'home_points', 'date', 'location',
'losing_abbr', 'losing_name', 'winner', 'winning_abbr',
'winning_name', 'home_ranking', 'away_ranking']
teams = Teams()
dataset = pd.read_csv('data.csv')
X = dataset.drop(FIELDS_TO_DROP, 1).dropna().drop_duplicates()
y = dataset[['home_points', 'away_points']].values
X_train, X_test, y_train, y_test = train_test_split(X, y)
parameters = {'bootstrap': False,
'min_samples_leaf': 3,
'n_estimators': 50,
'min_samples_split': 10,
'max_features': 'sqrt',
'max_depth': 6}
model = RandomForestRegressor(**parameters)
model.fit(X_train, y_train)
print(model.predict(X_test).astype(int), y_test)
我遵循了该网站上的指南:
https://towardsdatascience.com/predict-college-basketball-scores-in-30-lines-of-python-148f6bd71894
我稍微调整了一下代码以使其运行更快,所以我尝试仅运行原始代码和原始代码,但得到了相同的确切错误。请帮忙! 谢谢!
答案 0 :(得分:1)
您为X删除了空值和重复项,但不删除y。
如果您pub struct Document {
pages: Vec<Page>,
totalPages: i32,
_secret: ()
}
pub fn add_page(&mut self, dimension: PageDimension) -> &mut Page {
let newPage = Page::new(self.pages.len(), dimension);
self.pages.push(newPage);
newPage
}
,您将看到它们具有不同的值。
您应该执行以下操作:
print(X.shape[0], len(y))