Scikit:如何限制每个样本的预测最大值

时间:2018-11-29 15:21:52

标签: python scikit-learn


使用scikit时是否可以限制每个样本的预测最大值?在我的输入数据中,有一列(“公告”)是该特定样本的最大值,在这种情况下,“结果”是真实值。如何将预测限制在0-$ annoucement之间?

这是一个非常小的代码段/示例:

#!/usr/bin/env python3

from sklearn.linear_model import LinearRegression
import pandas as pd
from sklearn.model_selection import train_test_split

def main():
    mylist = [
    {'Id':101,'Username':"john",'Date':1475359200,'Announcement':111,'Result':50},
    {'Id':104,'Username':"john",'Date':1475359905,'Announcement':40,'Result':23},
    {'Id':222,'Username':"dave",'Date':1475399212,'Announcement':600,'Result':420},
    {'Id':301,'Username':"john",'Date':1475559256,'Announcement':300,'Result':150},
    {'Id':407,'Username':"dave",'Date':1475659277,'Announcement':10,'Result':8}
    ]

    df = pd.DataFrame(mylist)
    df['Username'] =  pd.Series(pd.factorize(df['Username'])[0] + 1).astype('category')
    y = df['Result'].values
    df = df.drop('Result', axis=1)
    X_train, X_test, y_train, y_test = train_test_split(df, y, random_state=2)
    clf = LinearRegression()
    clf.fit(X_train, y_train)
    predictions = clf.predict(X_test)
    print("predictions")
    print(predictions)
    print("true values")
    print(y_test)

if __name__ == '__main__':
    main()

输出:

predictions
[ 255.81049569   52.35007969]
true values
[420   8]

在这种情况下,问题是第二个值。

预先感谢

1 个答案:

答案 0 :(得分:0)

我不确定如何在Scikit-Learn中本地执行此操作,但是如果通过列表理解,如果您的预测大于此值,则可以将其设置为Announcement列中的值:

predictions = [p if p < a else a for p, a in zip(predictions, X_test['Announcement'])]

结果:

predictions
[255.81049569325114, 10]
true values
[420   8]