标记数据时出错。 C错误:第3行预计有14个字段,见过57

时间:2017-11-27 05:33:25

标签: python deep-learning keras rnn

标记化在上下文中意味着什么。我一直收到这个错误,我无法训练我的模型。请有人检查我的样本数据集和代码,看看我哪里出错了?作为参考,回溯如下:

(keras_tf) G:\Python\integer_sequencing>python MyTest.py
Using TensorFlow backend.
Traceback (most recent call last):
  File "MyTest.py", line 20, in <module>
    trainset= pd.read_csv('G:/Python/integer_sequencing/sample.csv')
  File "C:\Users\sarah\Anaconda3\envs\keras_tf\lib\site-packages\pandas\io\parsers.py", line 655, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\sarah\Anaconda3\envs\keras_tf\lib\site-packages\pandas\io\parsers.py", line 411, in _read
    data = parser.read(nrows)
  File "C:\Users\sarah\Anaconda3\envs\keras_tf\lib\site-packages\pandas\io\parsers.py", line 1005, in read
    ret = self._engine.read(nrows)
  File "C:\Users\sarah\Anaconda3\envs\keras_tf\lib\site-packages\pandas\io\parsers.py", line 1748, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas\_libs\parsers.c:10862)
  File "pandas\_libs\parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas\_libs\parsers.c:11138)
  File "pandas\_libs\parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas\_libs\parsers.c:11884)
  File "pandas\_libs\parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas\_libs\parsers.c:11755)
  File "pandas\_libs\parsers.pyx", line 2184, in pandas._libs.parsers.raise_parser_error (pandas\_libs\parsers.c:28765)
pandas.errors.ParserError: Error tokenizing data. C error: Expected 14 fields in line 3, saw 57


(keras_tf) G:\Python\integer_sequencing>

我的代码如下:

import numpy as np
from numpy import array
import matplotlib.pyplot as plt
import pandas as pd
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import SimpleRNN
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from keras.preprocessing.sequence import pad_sequences
import csv


trainset= pd.read_csv('G:/Python/integer_sequencing/sample.csv')
trainset.head()

testset= pd.read_csv('G:/Python/integer_sequencing/sample.csv')
testset.head()



Y_train = trainset.Sequence.apply(lambda x: str.split(x, ',')[-1] )
X_train = trainset.Sequence.apply(lambda x: str.split(x, ',')[:-1] )
trainset.head()

Y_test = testset.Sequence.apply(lambda x: str.split(x, ',')[-1] )
X_test = testset.Sequence.apply(lambda x: str.split(x, ',')[:-1] )
testset.head()

maxlen1 = int(round(trainset.Sequence.apply(len).max()))
maxlen2 = int(round(testset.Sequence.apply(len).max()))

X_train = pad_sequences(X_train, dtype='float', maxlen=maxlen1)
X_test = pad_sequences(X_test, dtype='float', maxlen=maxlen2)
#Y_train = pad_sequences(Y_train, dtype='float', maxlen=maxlen)


trainX, trainY = np.array(X_train), np.array(Y_train)
testX, testY = np.array(X_test), np.array(Y_test)

trainX = trainX.reshape(trainX.shape + (1,))
testX = testX.reshape(testX.shape + (1,))

model = Sequential()
model.add(LSTM(10, input_shape=(maxlen1, 1)))
model.add(Dense(1, activation='linear'))

# try using different optimizers and different optimizer configs
model.compile(loss='mse', optimizer='rmsprop')
model.summary()
model.fit(trainX, trainY, batch_size=32, epochs=5, verbose=0)

yhat = model.predict(trainX, verbose=0)
print(yhat)

#print(model.evaluate(X_test_rshp, y_test))

我正在使用的训练集可以从sample file下载 我意识到有类似的问题,我尝试使用所有建议的方法,但在这种情况下似乎没有任何工作。

0 个答案:

没有答案