我完成了一个代码,该代码使用几种分类算法(例如KNN或Naive-Bayes)进行预测。使用几种算法背后的主要思想是比较其准确性等级。 Logistic回归的准确度约为%88,而KNN的准确度约为%76。当我向老年人展示代码时。他们告诉我评分太低。
Github存储库和代码集:https://github.com/halilzcler/product-prediction
这就是我转换给定数据的方式,为了进行预测,我使用了scikit-learn库。我没有在此处添加完整的代码,完整的代码在github repo中。
df1 = pd.read_csv('data.train.csv')
df2 = pd.read_csv('data.test.csv')
df1.rename( columns={'Unnamed: 0':'Customer', 'Products bought by the customer':'ItemId', 'New products bought by the customer':'Product'}, inplace=True )
df2.rename( columns={'Unnamed: 0':'Customer', 'Products bought by the customers':'ItemId', 'New products bought by the customers':'Product'}, inplace=True )
def products(item):
productList = item.split(";")
return productList
df1["ItemId"] = df1["ItemId"].apply(products)
df2["ItemId"] = df2["ItemId"].apply(products)
df1_dummies = df1["ItemId"].str.join(sep='*').str.get_dummies(sep='*')
df2_dummies = df2["ItemId"].str.join(sep='*').str.get_dummies(sep='*')
dfTrain = df1.merge(df1_dummies, left_index = True, right_index = True) \
.drop("ItemId", axis = 1) \
.drop("Customer", axis = 1)
dfTest = df2.merge(df2_dummies, left_index = True, right_index = True) \
.drop("ItemId", axis = 1) \
.drop("Customer", axis = 1)
le = preprocessing.LabelEncoder().fit(dfTrain['Product'])
dfTrain['Product'] = le.fit_transform(dfTrain['Product'])
X_train = dfTrain[dfTrain.columns[-74:]].values
X_test = dfTest[dfTest.columns[-74:]].values
y_train = dfTrain.iloc[:, 0].values
根据反馈,任何分类算法的预期最低准确性等级必须在%90附近,我的结果较低,而且我找不到我的错误。我不确定问题出在哪里。
感谢您的帮助。