块状数组-重塑和切片

时间:2019-12-01 01:23:09

标签: python arrays numpy

尝试切片数据时出现IndexError。数据是存储为文本文件的单词标记和形状的对象数组(10848135,)。在将阵列重塑为2D之前,我需要提供以下建议:

  1. 如何使用numpy查看数据(或文件)以确保该数组不是列表数组或大小可变的数组-手动查看数据文件是一项艰巨的任务-?< / p>

  2. 我如何将dtype对象数组转换为整数,因为函数to_categorical需要int dtype吗?

  3. 在切片数组时应注意哪些注意事项?

下面是引发索引错误的函数:

def encode_words(self, dataset):
    data = dataset.split('\n')
    newShape = 2, -1
    tokenizer = Tokenizer()
    tokenizer.fit_on_texts(data)
    sequences = tokenizer.texts_to_sequences(data)
    vocab_size = len(tokenizer.word_index) + 1
    sequences = array(sequences)
    #sequences = np.array2string(sequences)
    sequences  = np.reshape(sequences, newShape)
    #sequences = np.array2string(sequences)
    print(sequences.dtype)
    print(sequences.shape)
    X, y = sequences[:-1], sequences[-1]
    print(y.dtype)
    #y = np.array2string(y)
    y = to_categorical(y, num_classes=vocab_size)
    seq_length = X.shape[1]
    return X, y, vocab_size, seq_length, tokenizer

下面是错误消息:

Reloaded modules: WordEmbedding

object
(2, 104309)
object
Traceback (most recent call last):

File "<ipython-input-18-9db02c6b1f06>", line 1, in <module>
  runfile('/home/asifa/anaconda3/deep_learning_project/processor.py', wdir='/home/asifa/anaconda3/deep_learning_project')

File "/home/asifa/anaconda3/envs/researchProject/lib/python3.6/site-packages/spyder_kernels/customize/spydercustomize.py", line 827, in runfile
  execfile(filename, namespace)

File "/home/asifa/anaconda3/envs/researchProject/lib/python3.6/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile
  exec(compile(f.read(), filename, 'exec'), namespace)

File "/home/asifa/anaconda3/deep_learning_project/processor.py", line 15, in <module>
  X,y,vocab_size,seq_length,tokenizer = emb.encode_words(seq_data)

File "/home/asifa/anaconda3/deep_learning_project/WordEmbedding.py", line 77, in encode_words
  y = to_categorical(y, num_classes=vocab_size)

File "/home/asifa/anaconda3/envs/researchProject/lib/python3.6/site-packages/keras/utils/np_utils.py", line 25, in to_categorical
  y = np.array(y, dtype='int')

ValueError: setting an array element with a sequence.

0 个答案:

没有答案