使用scikit时是否可以限制每个样本的预测最大值?在我的输入数据中,有一列(“公告”)是该特定样本的最大值,在这种情况下,“结果”是真实值。如何将预测限制在0-$ annoucement之间?
这是一个非常小的代码段/示例:
#!/usr/bin/env python3
from sklearn.linear_model import LinearRegression
import pandas as pd
from sklearn.model_selection import train_test_split
def main():
mylist = [
{'Id':101,'Username':"john",'Date':1475359200,'Announcement':111,'Result':50},
{'Id':104,'Username':"john",'Date':1475359905,'Announcement':40,'Result':23},
{'Id':222,'Username':"dave",'Date':1475399212,'Announcement':600,'Result':420},
{'Id':301,'Username':"john",'Date':1475559256,'Announcement':300,'Result':150},
{'Id':407,'Username':"dave",'Date':1475659277,'Announcement':10,'Result':8}
]
df = pd.DataFrame(mylist)
df['Username'] = pd.Series(pd.factorize(df['Username'])[0] + 1).astype('category')
y = df['Result'].values
df = df.drop('Result', axis=1)
X_train, X_test, y_train, y_test = train_test_split(df, y, random_state=2)
clf = LinearRegression()
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
print("predictions")
print(predictions)
print("true values")
print(y_test)
if __name__ == '__main__':
main()
输出:
predictions
[ 255.81049569 52.35007969]
true values
[420 8]
在这种情况下,问题是第二个值。
预先感谢
答案 0 :(得分:0)
我不确定如何在Scikit-Learn中本地执行此操作,但是如果通过列表理解,如果您的预测大于此值,则可以将其设置为Announcement
列中的值:>
predictions = [p if p < a else a for p, a in zip(predictions, X_test['Announcement'])]
结果:
predictions
[255.81049569325114, 10]
true values
[420 8]