ValueError:使用序列设置数组元素。在session.run

时间:2019-07-24 12:55:32

标签: python numpy tensorflow keras deep-learning

我正在尝试使用“嵌入向量”一词建立模型。当我加载向量的数据时,在运行会话时会出错。

我看到很多帖子都犯了同样的错误,但没有一个对我有帮助。 我的代码如下:

# Build vocabulary
max_document_length = max([len(x.split(" ")) for x in x_text])
if (not use_glove):
    print ("Not using GloVe")
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)
    x = np.array(list(vocab_processor.fit_transform(x_text)))
else:
    print ("Using GloVe")
    embedding_dim = 50
    filename = 'glove.twitter.27B.50d.txt'
    def loadGloVe(filename):
        vocab = []
        embd = []
        file = open(filename,'r')
        for line in file.readlines():
            row = line.strip().split(' ')
            vocab.append(row[0])
            embd.append(row[1:])
        print('Loaded GloVe!')
        file.close()
        return vocab,embd
    vocab,embd = loadGloVe(filename)
    vocab_size = len(vocab)
    embedding_dim = len(embd[0])
    embedding = np.asarray(embd)

    W = tf.Variable(tf.constant(0.0, shape=[vocab_size, embedding_dim]),
                    trainable=False, name="W")
    embedding_placeholder = tf.placeholder(tf.float32, [vocab_size, embedding_dim])
    embedding_init = W.assign(embedding_placeholder)
    # embedding_init = np.vstack([np.expand_dims(x, 0) for x in embedding_init])

    session_conf = tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)
    sess = tf.Session(config=session_conf)
    sess.run(embedding_init, feed_dict={embedding_placeholder: embedding})

我得到的错误如下:

>> python train.py
Loading data...
Using GloVe
Loaded GloVe!
Traceback (most recent call last):
  File "train.py", line 88, in <module>
    embedding = np.asarray(embd, dtype=float)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/numpy/core/numeric.py", line 538, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
rudaina:CS291K-master rudaina$ python train.py
Loading data...
Using GloVe
Loaded GloVe!
Traceback (most recent call last):
  File "train.py", line 88, in <module>
    embedding = np.asarray(embd, dtype=float)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/numpy/core/numeric.py", line 538, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

打印嵌入时,我得到以下信息:

  

[list(['0.78704','0.72151','0.29148','-0.056527','0.31683,   '0.47172','0.023461','0.69568','0.20782','0.60985','-0.22386',   '0.7481','-2.6208','0.20117','-0.48104','0.12897','0.035239',   '-0.24486','-0.36088','0.026686','0.28978','-0.10698','-0.34621',   '0.021053','0.54514','-1.0958','-0.274','0.2233','1.0827',   '-0.029018','-0.84029','0.58619','-0.36511','0.34016','0.89615',   '0.32757','0.24267','0.68404','-0.34374','0.13583','-2.2162',   '-0.42537','0.46157','0.88626','-0.22014','0.025599','-0.38615',   '0.080107','-0.075323','-0.61461'])列表(['0.68661','-1.0772',   '0.011114','-0.24075','-0.3422','0.64456','0.54957','0.30411',   '-0.54682','1.4695','0.43648','-0.34223','-2.7189','0.46021',   '0.016881','0.13953','0.020913','0.050963','-0.48108','-1.0764',   '-0.16807','-0.014315','-0.55055','0.67823','0.24359','-1.3179',   '-0.036348','-0.228','1.0337','-0.53221','-0.52934','0.35537',   '-0.44911','0.79506','0.56947','0.071642','-0.27455','-0.056911',   '-0.42961','-0.64412','-1.3495','0.23258','0.25383','-0.10226',   '0.65824','0.16015','0.20959','-0.067516','-0.51952','-0.34922'])   清单(['0.98483','0.19784','0.28403','0.35406','0.2438','0.42519',   '-0.050784','0.48965','0.18231','0.45225','0.60871','0.1023',   '-2.246','0.47362','-0.20073','-0.21838','-0.58847','0.23933',   '0.47089','-0.96444','-0.06588','-0.26914','-0.58221','-0.26283',   '0.67984','-0.87678','-0.091667','0.18128','1.0218','0.23728',   '-1.0547','0.19766','-0.86072','0.6021','0.69374','0.32242',   '-0.074545','0.38367','0.28661','-0.41465','-2.882','-0.30393',   '0.047981','1.0937','0.4184','-0.68958','-0.45923','0.23368',   '-0.30628','-0.093607'])...列表(['0.84287','0.36278','-1.7695',   '1.0011','-0.035064','0.51417','-1.5918','0.85464','1.0441',   '-0.19218','0.91523','1.2206','0.6551','-0.48092','0.89536',   '-0.51738','-0.113','-0.14132','0.69741','-0.094937','-0.046912',   '-0.2098','-0.029853','0.49541','0.66782','0.23435','1.6776',   '0.13993','1.2205','0.11827','0.4398','-0.37945','0.26414',   '0.63263','-0.48117','-0.95508','-0.39435','-2.8466','-0.64169',   '0.61715','3.0288','1.2714','-2.1379','-0.11995','-1.5553',   '-0.17096','-0.30855','-0.24573','0.63324','-0.80304'])   清单(['0.82853','-1.4966','-0.33163','-1.7248','0.75364',   '-0.66916','0.21631','0.54184','-0.18342','0.4248','0.21309',   '0.21076','0.60751','-0.31577','0.5663','0.10905','0.12388',   '-1.0154','0.32227','-0.92746','-0.59573','-0.8008','1.146',   '1.1625','0.32181','0.30272','0.99954','-1.4012','0.076173'   '-0.081811','1.7618','1.0314','1.2658','1.3319','0.52592',   '-0.30999','-1.4563','-1.4165','0.21875','0.36172','2.7735',   '0.20257','0.074379','-0.020002','-1.0133','0.56882','-0.17648',   '0.3729','0.76953','1.4394'])列表(['-2.3613','-0.94632',   '-1.8524','1.545','0.29188','0.21677','0.090334','-1.4557',   '0.80716','-0.88994','-1.1031','0.002139','1.211','-0.069074',   '1.1984','0.93501','1.0359','-0.17041','0.44013','-1.7879',   '0.61577','0.52878','0.32978','-0.82872','0.48385','0.76497',   '-0.64303','0.18897','0.3698','0.62647','1.7118','-0.2942',   '-0.26316','-0.35169','-0.72771','-0.71678','0.91815','-0.56122',   '0.51562','-0.030861','-0.017585','-0.58224','-0.98393',   '0.85906','-0.67031','0.34382','-0.41876','-0.40575','-0.53006',   '-0.20514'])]

我该如何解决?虽然花了很多时间试图修复它,但这似乎很简单,但我不知道。

1 个答案:

答案 0 :(得分:0)

当您尝试从具有不同大小的单个数组的列表中创建 NumPy 数组时,就会发生这种情况。我在读取 GloVe 文件时遇到了这个问题,就像你一样,通过空格字符手动分割每一行。

如果我们的 glove.twitter.27B.50d.txt 是同一个文件,第 38523 行包含

0.065581 0.39605 -0.96669 0.23706 -0.41379 -0.97006 0.16601 -1.292 -0.58989 0.11632 -1.365 -0.27939 -0.57222 -0.97108 -0.56319 -0.015263 -0.70465 -0.13867 1.0702 -0.25557 0.25122 -0.87755 0.70999 0.9118 -0.30077

词汇对我来说似乎是一个不可打印的字符。这将导致代码将第一个嵌入向量读取为词汇,并且在该特定行获得的嵌入向量数量较少(如您的情况为 49 维)。

在 glove.twitter.27B.25d.txt、glove.twitter.27B.100d.txt 和 glove.twitter.27B.200d.txt 中也可以找到完全相同的行

有效的快速而肮脏的解决方案是:

for line in file.readlines():
    row = line.strip().split(' ')
    if len(row)-1 < embedding_dim:
        row.insert(0, '')
    vocab.append(row[0])
    embd.append(row[1:])