提高推荐模型的准确性

时间:2019-07-18 13:56:15

标签: pandas machine-learning scikit-learn statistics prediction

我已经为位置推荐创建了一个模型。我有类似的数据,我想建议在这里换一下位置。数据几乎相似,仅添加了几列新列,而删除了几列。

当我清理数据并准备好模型时,准确性显着下降,几乎达到1%的准确性。

数据清理

df = pd.read_csv('cleaned_data.csv')

df = df.convert_objects(convert_numeric=True)
# checking for missing values if any
df.isnull().sum()
# replacing missing fields with linear interpolation
df['shift_accepted_role'] = df['shift_accepted_role'].interpolate(method='linear')
df['shift_accepted_specialities'] = df['shift_accepted_specialities'].interpolate(method='linear')
df['distance'] = df['distance'].interpolate(method='linear')
df['years_of_experience'] = df['years_of_experience'].interpolate(method='linear')

# we delete this one row instead of bringing a possible variance 
df.dropna(inplace=True)

# seperating feature vector and target class for easier encoding of city and status columns
Y = df['shift_id'].get_values()
X = df.drop(columns=['shift_id']).get_values()
print(X.shape,Y.shape)

# now that we've finished encoding we can write the final data in a new array
final_data = np.zeros((90180,8))
final_data[:,:-1] = X
final_data[:,-1] = Y

columns = ['user_id','location_id','is_shift_accepted','shift_accepted_role','shift_accepted_specialities','distance','years_of_experience','shift_id']
df_final = pd.DataFrame(final_data,columns=columns)
df_final.head(20)

df_final.to_csv('cleaned_data.csv')

模型代码-

seed = 1
np.random.seed(seed) # setting seed for random number generation

# reading data from csv file
df = pd.read_csv('cleaned_data.csv')
df = df.drop(df.columns[0], axis=1)
df.head()


# transferring data to numpy arrays
X = df.get_values()
Y = X[:,-1]
X = X[:,:-1]
print(X.shape,Y.shape)

# splitting into train and test sets
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.085, random_state=seed) # 5000 test samples
print(len(X_train))
print(len(X_test))

输出

Random Foresh 10 Fold Cross Validation scores: [0.0007271  0.00084828 0.00096946 0.00109064 0.00024239 0.00036359
 0.00096958 0.00060599 0.00072718 0.00072718] mean =  0.0007271405225753567

数据-http://www.sharecsv.com/s/7e19695408334074756ebb6e54458afe/shift-reco.csv

我要采取任何错误的措施吗?任何提示高度赞赏。

0 个答案:

没有答案