我在python中有代码,可以读取训练数据集,恶意软件测试和常规文件。当数据数量为100时,程序可以读取数据集输入。但是当数据量大于100时,程序无法读取数据。谁能帮忙吗?
这是一个使用随机森林的分类程序,该程序使用https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html中的库。
import pandas as pd
from sklearn.tree import export_graphviz
import pydotplus
import subprocess
dataset = pd.read_csv("D:\\Kuliah\\Sweet\\Kodingan\\fix\kodingan\\fix\\data normalisasi\\hasilakhir_2.csv")
dataset.head()
print(dataset.head())
feature_list = list(dataset.columns)
col = len(dataset.columns.values)
X = dataset.iloc[:, 0:col-1].values
y = dataset.iloc[:, col-1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.5)
print(len(X_test))
print(len(X_train))
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from sklearn.ensemble import RandomForestClassifier
rd = RandomForestClassifier(n_estimators=100)
rd.fit(X_train, y_train)
y_pred = rd.predict(X_test)
estimator = rd.estimators_[col-2]
export_graphviz(estimator,
out_file='tree.dot',
feature_names = col_feature,
class_names = col_target,
rounded = True, proportion = False,
precision = 2, filled = True)
from subprocess import call
subprocess.call(['dot', '-Tpng', 'tree.dot', '-o', 'tree.png', '-Gdpi=600'])
from IPython.display import Image
Image(filename = 'tree.png')
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test, y_pred))
opcode0 opcode1 opcode2 opcode3 ... Unnamed: 397 Unnamed: 398
0 0.012559 0.433281 0.012559 0.012559 ... NaN NaN
1 0.012559 0.433281 0.012559 0.012559 ... NaN NaN
2 0.734694 0.006279 0.734694 0.734694 ... NaN NaN
3 0.012559 0.439560 0.012559 0.012559 ... NaN NaN
4 0.012559 0.012559 0.427002 0.602826 ... NaN NaN
预期的结果是对opcode0进行归一化,直到opcode-n可以产生数字为止。但是写入的结果是opcode100到opcode-n的NaN。这是用于数据规范化的文件https://drive.google.com/open?id=1ZaCw9Hkh6QQAfFvcvgpuurdcY_1hkdC9