尝试切片数据时出现IndexError。数据是存储为文本文件的单词标记和形状的对象数组(10848135,)。在将阵列重塑为2D之前,我需要提供以下建议:
如何使用numpy查看数据(或文件)以确保该数组不是列表数组或大小可变的数组-手动查看数据文件是一项艰巨的任务-?< / p>
我如何将dtype对象数组转换为整数,因为函数to_categorical需要int dtype吗?
在切片数组时应注意哪些注意事项?
下面是引发索引错误的函数:
def encode_words(self, dataset):
data = dataset.split('\n')
newShape = 2, -1
tokenizer = Tokenizer()
tokenizer.fit_on_texts(data)
sequences = tokenizer.texts_to_sequences(data)
vocab_size = len(tokenizer.word_index) + 1
sequences = array(sequences)
#sequences = np.array2string(sequences)
sequences = np.reshape(sequences, newShape)
#sequences = np.array2string(sequences)
print(sequences.dtype)
print(sequences.shape)
X, y = sequences[:-1], sequences[-1]
print(y.dtype)
#y = np.array2string(y)
y = to_categorical(y, num_classes=vocab_size)
seq_length = X.shape[1]
return X, y, vocab_size, seq_length, tokenizer
下面是错误消息:
Reloaded modules: WordEmbedding
object
(2, 104309)
object
Traceback (most recent call last):
File "<ipython-input-18-9db02c6b1f06>", line 1, in <module>
runfile('/home/asifa/anaconda3/deep_learning_project/processor.py', wdir='/home/asifa/anaconda3/deep_learning_project')
File "/home/asifa/anaconda3/envs/researchProject/lib/python3.6/site-packages/spyder_kernels/customize/spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "/home/asifa/anaconda3/envs/researchProject/lib/python3.6/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/home/asifa/anaconda3/deep_learning_project/processor.py", line 15, in <module>
X,y,vocab_size,seq_length,tokenizer = emb.encode_words(seq_data)
File "/home/asifa/anaconda3/deep_learning_project/WordEmbedding.py", line 77, in encode_words
y = to_categorical(y, num_classes=vocab_size)
File "/home/asifa/anaconda3/envs/researchProject/lib/python3.6/site-packages/keras/utils/np_utils.py", line 25, in to_categorical
y = np.array(y, dtype='int')
ValueError: setting an array element with a sequence.