TensorFlow数据格式要求

时间:2017-11-02 23:19:37

标签: python python-3.x tensorflow nlp

我有点困惑如何读取数据到tensorflow。我正在尝试创建一个LSTM并使用tf.nn.embedding_lookup来查找向量表示,但我似乎无法运行它。

我的数据目前看起来像这样:

Out[494]: 
   sentiment                                      glove_indexes
0          0  [574305, 1294, 939107, 657375, 571132, 1013429...
1          0                                           [500519]
2          4                                    [560941, 93286]
3          0  [972036, 569274, 478483, 1051901, 684125, 6482...
4          0  [156951, 572457, 465860, 132739, 284963, 11483...

我还有一个字典glove_ids,我可以使用这些索引来调用这些字词的矢量表示。

我以为我可以简单地致电

embed = tf.nn.embedding_lookup(glove_ids, inputs_data)

获取向量表示,但这不起作用。有人可以帮我正确安装吗?

修改 我尝试了一种也无效的解决方法。我只是希望得到关于如何解决这个问题的一般性指导......

我现在将epoch_x_train作为长度为18的向量,我说的是单词的最大长度,并且epoch_x_train中的每个条目都是25,这是嵌入的长度。我相信这是正确的,每个单词都有正确的嵌入。 getTrainBatch随机抽取新数据以使模型适合。我收到错误

ValueError: setting an array element with a sequence.


def getTrainBatch():
    labels = []
    arr = np.zeros([batch_size , maxSeqLength])
    for i in range(batch_size ):
        num = randint(0,len(train_dat))
        labels.append(y_train[num])
        arr[i] = x_train[num]
    return arr, labels

def my_lookup(dat):
    new = []
    for i in range(len(dat)):
        temp = []
        for j in range(len(dat[i])):
            if dat[i][j] == 0:
                temp.append(list(np.zeros(maxSeqLength)))
            else:
                temp.append(glove_ids[dat[i][j]])
        new.append(temp)
    return new


maxSeqLength = 18
x_train = train_dat['glove_indexes']
x_train = np.array(x_train)
x_train = sequence.pad_sequences(x_train, maxlen=maxSeqLength)

y_train = train_dat['sentiment']
y_train = np.where(y_train == 4, 1, 0)
y_train = np.array(y_train)

lstm_size = 256
batch_size = 500
learning_rate = 0.001
embed_size = GloVeEncodingsSize
n_outputs = 2



X = tf.placeholder(tf.float32, [None, embed_size, maxSeqLength])
Y = tf.placeholder(tf.int32, [None])

basic_cell = tf.contrib.rnn.BasicRNNCell(num_units = lstm_size)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype = tf.float32)

logits = tf.layers.dense(states, n_outputs)
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=Y,logits=logits)

loss = tf.reduce_mean(xentropy)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
correct = tf.nn.in_top_k(logits, Y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

init = tf.global_variables_initializer()

n_epochs = 100



with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        epoch_x_train, epoch_y_train = getTrainBatch()
        epoch_x_train = my_lookup(epoch_x_train)

        sess.run(training_op, feed_dict={X: epoch_x_train, Y: epoch_y_train})
        acc_train = accuracy.eval(feed_dict={X: epoch_x_train, Y: epoch_y_train})
        print(epoch, "Train accuracy:", acc_train)

再次编辑 从更多的谷歌搜索看起来错误来自feed_dict。我无法弄清楚为什么这是错误的。我已经尝试了[1.0]格式的响应,或者每个x_train行只有1或0。

完整的错误消息

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\py35\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-77-7960c1e2188b>", line 12, in <module>
    sess.run(training_op, feed_dict={X: np.array(epoch_x_train), Y: np.array(epoch_y_train)})
  File "C:\ProgramData\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
    run_metadata_ptr)
  File "C:\ProgramData\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1089, in _run
    np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
  File "C:\ProgramData\Anaconda3\envs\py35\lib\site-packages\numpy\core\numeric.py", line 531, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

0 个答案:

没有答案