我想从中获取解析的日志数据,例如OpenStack,其中一行看起来像这样:
nova-compute.log.2017-05-14_21:27:09 2017-05-14 19:39:09.660 2931 WARNING nova.compute.manager [req-addc1839-2ed5-4778-b57e-5854eb7b8b09 - - - - -] While synchronizing instance power states, found 1 instances in the database and 0 instances on the hypervisor.
"While synchronizing instance power states, found <*> instances in the database and <*> instances on the hypervisor."
因为我只对有效负载感兴趣,而没有 values 。
当然,这些日志输出有100k ++行。
我对每一行都执行上述过程。
我现在想做的是获取已解析的字符串,对它们进行标记化,然后将标记转换为单词嵌入向量(例如,具有天赋)。
这使我得到一个[1, 2148]
大小的向量。
我想将这些向量提供给LSTM神经网络,并训练网络预测对数输出中的下一个单词。为了进行培训,日志输出将不包含任何异常。以后,应该使用此模型来检测日志输出中确实包含异常的异常。
import argparse
import numpy as np
import torch
from keras.callbacks import ModelCheckpoint
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.models import Sequential
parser = argparse.ArgumentParser()
parser.add_argument('-num_classes', type=int, default=2148)
parser.add_argument('-num_layers', default=2, type=int)
parser.add_argument('-hidden_size', default=128, type=int)
parser.add_argument('-window_size', default=10, type=int)
args = parser.parse_args()
num_classes = args.num_classes
num_layers = args.num_layers
hidden_size = args.hidden_size
window_size = args.window_size
seq_length = 5
dataset = torch.load('words.pt')
dataset_numpy = []
for tensor in dataset:
dataset_numpy.append(tensor.numpy())
n_chars = len(dataset_numpy)
data_x = []
data_y = []
for i in range(0, n_chars - seq_length):
data_x.append(dataset_numpy[i: i + seq_length])
data_y.append(dataset_numpy[i + seq_length])
n_patterns = len(data_x)
input_x = np.reshape(data_x, (n_patterns * len(dataset[0]), seq_length, 1))
model = Sequential()
model.add(LSTM(256, input_shape=(input_x.shape[1], input_x.shape[2])))
model.add(Dropout(0.1))
model.add(Dense(len(dataset[0])))
model.compile(loss='categorical_crossentropy', optimizer='adam')
filepath = "weights-improvement-{epoch:02d}-{.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
model.fit(input_x, data_y, epochs=20, batch_size=12, callbacks=callbacks_list)
我收到此错误:
Traceback (most recent call last):
File "/Users/haraldott/Development/thesis/anomaly_detection_main/venv/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3325, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-06b45e501bb8>", line 1, in <module>
runfile('/Users/haraldott/Development/thesis/anomaly_detection_main/loganaliser/model_train.py', wdir='/Users/haraldott/Development/thesis/anomaly_detection_main/loganaliser')
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/haraldott/Development/thesis/anomaly_detection_main/loganaliser/model_train.py", line 113, in <module>
model.fit(input_x, data_y, epochs=20, batch_size=12, callbacks=callbacks_list)
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/keras/engine/training.py", line 952, in fit
batch_size=batch_size)
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/keras/engine/training.py", line 789, in _standardize_user_data
exception_prefix='target')
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/keras/engine/training_utils.py", line 102, in standardize_input_data
str(len(data)) + ' arrays: ' + str(data)[:200] + '...')
ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 7644 arrays: [array([[0. ],
[0. ],
[0. ],
...,
[0.00018512],
[0.02950284],
[0.00609318]], dtype=float32), array([[ 0. ],
[ 0. ],
那么如何解决此错误? 我知道data_y必须包含包含类的第二维。但是我不确定上下文中的类是什么,我只有大小为[1,2148]的向量。