将分类网络流量特征转换为数值-ISCX VPN2016数据集

时间:2019-11-28 18:39:30

标签: python numpy tensorflow scikit-learn neural-network

我正在使用ISCX VPN2016数据集对加密的网络流量进行分类,我想实现一种深度神经网络技术进行分类。 数据集包含14个pcap文件,指示14类流量,我将pcap文件导出为csv,将一列添加为类,并将它们合并为一个文件。但是问题是要素的数据类型,我无法将其转换为数值要素,我尝试在Numpy,Pandas和Sklearn中使用建议的常用方法,例如:OneHotEncoderLabelEncoder,{{1 }},astype,... 但是他们都不起作用。

我的问题是我应该怎么做才能完全转换这些功能?如果根本需要转换? 这是我的代码:

get_dummies

这是错误:

from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer


seed = 9
np.random.seed(seed)
netTraffic = np.loadtxt('netTraffic_100each.csv', delimiter=',', skiprows=1)

# OneHotEncoder
make_column_transformer(
    (OneHotEncoder(), ['Source'], ['Destination'], ['Protocol'], ['Info']))

# LabelEncoder
le = preprocessing.LabelEncoder()
le.fit(['Class'])
list(le.classes_)
le.transform(['Class'])
print(netTraffic.Class.dtypes)

X = netTraffic[:, 0:6]
Y = netTraffic[:, 6]

(X_train, X_test, Y_train, Y_test) = train_test_split(X, Y, test_size=0.3, random_state=seed)

model = Sequential()
model.add(Dense(7, input_dim=6, init='uniform', activation='relu'))
model.add(Dense(6, init='uniform', activation='relu'))
model.add(Dense(14, init='uniform', activation='relu'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X_train, Y_train, validation_data=(X_test, Y_test), nb_epoch=20, batch_size=5)

scores = model.evaluate(X_test, Y_test)
print("Accuracy: %.2f%%" % (scores[1] * 100))

前几行数据:

enter image description here

我还在这里更新了用于此代码的csv文件:https://gofile.io/?c=L8UNYb

1 个答案:

答案 0 :(得分:1)

看看pd.get_dummies

import pandas as pd

df = pd.read_csv('netTraffic_100each.csv')
df_encoded = pd.get_dummies(df, drop_first=True)
..

enter image description here