sklearn朴素贝叶斯在Python

时间:2018-07-07 08:18:00

标签: python scikit-learn naivebayes

我已经在“岩石和地雷”数据集上训练了分类器 (https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data) 而且在计算准确性得分时,它似乎总是非常准确(输出为1.0),我很难相信。我是在犯任何错误,还是幼稚的贝叶斯功能强大?

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data'
data = urllib.request.urlopen(url)
df = pd.read_csv(data)

# replace R and M with 1 and 0
m = len(df.iloc[:, -1])
Y = df.iloc[:, -1].values
y_val = []
for i in range(m):
    if Y[i] == 'M':
        y_val.append(1)
    else:
        y_val.append(0)
df = df.drop(df.columns[-1], axis = 1) # dropping column containing 'R', 'M'

X = df.values

from sklearn.model_selection import train_test_split
    # initializing the classifier
    clf = GaussianNB()
    # splitting the data
    train_x, test_x, train_y, test_y = train_test_split(X, y_val, test_size = 0.33, random_state = 42)
    # training the classifier
    clf.fit(train_x, train_y)
    pred = clf.predict(test_x) # making a prediction
    from sklearn.metrics import accuracy_score
    score = accuracy_score(pred, test_y)
    # printing the accuracy score
    print(score)

X是输入,y_val是输出(我已将“ R”和“ M”转换为0和1)

1 个答案:

答案 0 :(得分:1)

这是因为train_test_split()函数中的random_state参数。
random_state设置为整数时,sklearn确保数据采样是恒定的。
这意味着每次通过指定random_state来运行它时,都会得到相同的结果,这是预期的行为。
有关更多详细信息,请参考docs