ValueError:无法将字符串转换为float:恶意软件

时间:2019-05-09 06:29:04

标签: python random-forest

我在python中有代码,可以读取训练数据集,恶意软件测试和常规文件。当数据数量为100时,程序可以读取数据集输入。但是当数据量大于100时,程序无法读取数据。谁能帮忙吗?

这是一个使用随机森林的分类程序,该程序使用https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html中的库。

import pandas as pd
from sklearn.tree import export_graphviz
import pydotplus
import subprocess


dataset = pd.read_csv("D:\\Kuliah\\Sweet\\Kodingan\\fix\kodingan\\fix\\data normalisasi\\hasilakhir_2.csv") 
dataset.head()
print(dataset.head())
feature_list = list(dataset.columns)
col = len(dataset.columns.values)



X = dataset.iloc[:, 0:col-1].values
y = dataset.iloc[:, col-1].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.5) 
print(len(X_test))
print(len(X_train))

from sklearn.preprocessing import StandardScaler 

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

from sklearn.ensemble import RandomForestClassifier

rd = RandomForestClassifier(n_estimators=100)
rd.fit(X_train, y_train)
y_pred = rd.predict(X_test)



estimator = rd.estimators_[col-2]

export_graphviz(estimator,
            out_file='tree.dot',
            feature_names = col_feature,
            class_names = col_target,
            rounded = True, proportion = False,
            precision = 2, filled = True)

from subprocess import call
subprocess.call(['dot', '-Tpng', 'tree.dot', '-o', 'tree.png', '-Gdpi=600'])

from IPython.display import Image
Image(filename = 'tree.png')

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test, y_pred))
    opcode0   opcode1   opcode2   opcode3  ...  Unnamed: 397  Unnamed: 398  
0  0.012559  0.433281  0.012559  0.012559  ...           NaN           NaN           
1  0.012559  0.433281  0.012559  0.012559  ...           NaN           NaN           
2  0.734694  0.006279  0.734694  0.734694  ...           NaN           NaN           
3  0.012559  0.439560  0.012559  0.012559  ...           NaN           NaN           
4  0.012559  0.012559  0.427002  0.602826  ...           NaN           NaN           

预期的结果是对opcode0进行归一化,直到opcode-n可以产生数字为止。但是写入的结果是opcode100到opcode-n的NaN。这是用于数据规范化的文件https://drive.google.com/open?id=1ZaCw9Hkh6QQAfFvcvgpuurdcY_1hkdC9

0 个答案:

没有答案