使用LSTM

时间:2019-10-21 12:23:34

标签: python machine-learning keras

我想从中获取解析的日志数据,例如OpenStack,其中一行看起来像这样:

nova-compute.log.2017-05-14_21:27:09 2017-05-14 19:39:09.660 2931 WARNING nova.compute.manager [req-addc1839-2ed5-4778-b57e-5854eb7b8b09 - - - - -] While synchronizing instance power states, found 1 instances in the database and 0 instances on the hypervisor.

此解析为(例如DrainSpell

"While synchronizing instance power states, found <*> instances in the database and <*> instances on the hypervisor."

因为我只对有效负载感兴趣,而没有 values 。 当然,这些日志输出有100k ++行。 我对每一行都执行上述过程。 我现在想做的是获取已解析的字符串,对它们进行标记化,然后将标记转换为单词嵌入向量(例如,具有天赋)。 这使我得到一个[1, 2148]大小的向量。 我想将这些向量提供给LSTM神经网络,并训练网络预测对数输出中的下一个单词。为了进行培训,日志输出将不包含任何异常。以后,应该使用此模型来检测日志输出中确实包含异常的异常。

import argparse

import numpy as np
import torch
from keras.callbacks import ModelCheckpoint
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.models import Sequential

parser = argparse.ArgumentParser()
parser.add_argument('-num_classes', type=int, default=2148)
parser.add_argument('-num_layers', default=2, type=int)
parser.add_argument('-hidden_size', default=128, type=int)
parser.add_argument('-window_size', default=10, type=int)
args = parser.parse_args()
num_classes = args.num_classes
num_layers = args.num_layers
hidden_size = args.hidden_size
window_size = args.window_size
seq_length = 5
dataset = torch.load('words.pt')
dataset_numpy = []

for tensor in dataset:
    dataset_numpy.append(tensor.numpy())

n_chars = len(dataset_numpy)
data_x = []
data_y = []
for i in range(0, n_chars - seq_length):
    data_x.append(dataset_numpy[i: i + seq_length])
    data_y.append(dataset_numpy[i + seq_length])
n_patterns = len(data_x)

input_x = np.reshape(data_x, (n_patterns * len(dataset[0]), seq_length, 1))

model = Sequential()
model.add(LSTM(256, input_shape=(input_x.shape[1], input_x.shape[2])))
model.add(Dropout(0.1))
model.add(Dense(len(dataset[0])))
model.compile(loss='categorical_crossentropy', optimizer='adam')
filepath = "weights-improvement-{epoch:02d}-{.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

model.fit(input_x, data_y, epochs=20, batch_size=12, callbacks=callbacks_list)

我收到此错误:

  Traceback (most recent call last):
  File "/Users/haraldott/Development/thesis/anomaly_detection_main/venv/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3325, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-06b45e501bb8>", line 1, in <module>
    runfile('/Users/haraldott/Development/thesis/anomaly_detection_main/loganaliser/model_train.py', wdir='/Users/haraldott/Development/thesis/anomaly_detection_main/loganaliser')
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/haraldott/Development/thesis/anomaly_detection_main/loganaliser/model_train.py", line 113, in <module>
    model.fit(input_x, data_y, epochs=20, batch_size=12, callbacks=callbacks_list)
  File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/keras/engine/training.py", line 952, in fit
    batch_size=batch_size)
  File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/keras/engine/training.py", line 789, in _standardize_user_data
    exception_prefix='target')
  File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/keras/engine/training_utils.py", line 102, in standardize_input_data
    str(len(data)) + ' arrays: ' + str(data)[:200] + '...')
ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 7644 arrays: [array([[0.        ],
       [0.        ],
       [0.        ],
       ...,
       [0.00018512],
       [0.02950284],
       [0.00609318]], dtype=float32), array([[ 0.        ],
       [ 0.        ],

那么如何解决此错误? 我知道data_y必须包含包含类的第二维。但是我不确定上下文中的类是什么,我只有大小为[1,2148]的向量。

0 个答案:

没有答案