我正在尝试使用“嵌入向量”一词建立模型。当我加载向量的数据时,在运行会话时会出错。
我看到很多帖子都犯了同样的错误,但没有一个对我有帮助。 我的代码如下:
# Build vocabulary
max_document_length = max([len(x.split(" ")) for x in x_text])
if (not use_glove):
print ("Not using GloVe")
vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)
x = np.array(list(vocab_processor.fit_transform(x_text)))
else:
print ("Using GloVe")
embedding_dim = 50
filename = 'glove.twitter.27B.50d.txt'
def loadGloVe(filename):
vocab = []
embd = []
file = open(filename,'r')
for line in file.readlines():
row = line.strip().split(' ')
vocab.append(row[0])
embd.append(row[1:])
print('Loaded GloVe!')
file.close()
return vocab,embd
vocab,embd = loadGloVe(filename)
vocab_size = len(vocab)
embedding_dim = len(embd[0])
embedding = np.asarray(embd)
W = tf.Variable(tf.constant(0.0, shape=[vocab_size, embedding_dim]),
trainable=False, name="W")
embedding_placeholder = tf.placeholder(tf.float32, [vocab_size, embedding_dim])
embedding_init = W.assign(embedding_placeholder)
# embedding_init = np.vstack([np.expand_dims(x, 0) for x in embedding_init])
session_conf = tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)
sess = tf.Session(config=session_conf)
sess.run(embedding_init, feed_dict={embedding_placeholder: embedding})
我得到的错误如下:
>> python train.py
Loading data...
Using GloVe
Loaded GloVe!
Traceback (most recent call last):
File "train.py", line 88, in <module>
embedding = np.asarray(embd, dtype=float)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/numpy/core/numeric.py", line 538, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
rudaina:CS291K-master rudaina$ python train.py
Loading data...
Using GloVe
Loaded GloVe!
Traceback (most recent call last):
File "train.py", line 88, in <module>
embedding = np.asarray(embd, dtype=float)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/numpy/core/numeric.py", line 538, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
打印嵌入时,我得到以下信息:
[list(['0.78704','0.72151','0.29148','-0.056527','0.31683, '0.47172','0.023461','0.69568','0.20782','0.60985','-0.22386', '0.7481','-2.6208','0.20117','-0.48104','0.12897','0.035239', '-0.24486','-0.36088','0.026686','0.28978','-0.10698','-0.34621', '0.021053','0.54514','-1.0958','-0.274','0.2233','1.0827', '-0.029018','-0.84029','0.58619','-0.36511','0.34016','0.89615', '0.32757','0.24267','0.68404','-0.34374','0.13583','-2.2162', '-0.42537','0.46157','0.88626','-0.22014','0.025599','-0.38615', '0.080107','-0.075323','-0.61461'])列表(['0.68661','-1.0772', '0.011114','-0.24075','-0.3422','0.64456','0.54957','0.30411', '-0.54682','1.4695','0.43648','-0.34223','-2.7189','0.46021', '0.016881','0.13953','0.020913','0.050963','-0.48108','-1.0764', '-0.16807','-0.014315','-0.55055','0.67823','0.24359','-1.3179', '-0.036348','-0.228','1.0337','-0.53221','-0.52934','0.35537', '-0.44911','0.79506','0.56947','0.071642','-0.27455','-0.056911', '-0.42961','-0.64412','-1.3495','0.23258','0.25383','-0.10226', '0.65824','0.16015','0.20959','-0.067516','-0.51952','-0.34922']) 清单(['0.98483','0.19784','0.28403','0.35406','0.2438','0.42519', '-0.050784','0.48965','0.18231','0.45225','0.60871','0.1023', '-2.246','0.47362','-0.20073','-0.21838','-0.58847','0.23933', '0.47089','-0.96444','-0.06588','-0.26914','-0.58221','-0.26283', '0.67984','-0.87678','-0.091667','0.18128','1.0218','0.23728', '-1.0547','0.19766','-0.86072','0.6021','0.69374','0.32242', '-0.074545','0.38367','0.28661','-0.41465','-2.882','-0.30393', '0.047981','1.0937','0.4184','-0.68958','-0.45923','0.23368', '-0.30628','-0.093607'])...列表(['0.84287','0.36278','-1.7695', '1.0011','-0.035064','0.51417','-1.5918','0.85464','1.0441', '-0.19218','0.91523','1.2206','0.6551','-0.48092','0.89536', '-0.51738','-0.113','-0.14132','0.69741','-0.094937','-0.046912', '-0.2098','-0.029853','0.49541','0.66782','0.23435','1.6776', '0.13993','1.2205','0.11827','0.4398','-0.37945','0.26414', '0.63263','-0.48117','-0.95508','-0.39435','-2.8466','-0.64169', '0.61715','3.0288','1.2714','-2.1379','-0.11995','-1.5553', '-0.17096','-0.30855','-0.24573','0.63324','-0.80304']) 清单(['0.82853','-1.4966','-0.33163','-1.7248','0.75364', '-0.66916','0.21631','0.54184','-0.18342','0.4248','0.21309', '0.21076','0.60751','-0.31577','0.5663','0.10905','0.12388', '-1.0154','0.32227','-0.92746','-0.59573','-0.8008','1.146', '1.1625','0.32181','0.30272','0.99954','-1.4012','0.076173' '-0.081811','1.7618','1.0314','1.2658','1.3319','0.52592', '-0.30999','-1.4563','-1.4165','0.21875','0.36172','2.7735', '0.20257','0.074379','-0.020002','-1.0133','0.56882','-0.17648', '0.3729','0.76953','1.4394'])列表(['-2.3613','-0.94632', '-1.8524','1.545','0.29188','0.21677','0.090334','-1.4557', '0.80716','-0.88994','-1.1031','0.002139','1.211','-0.069074', '1.1984','0.93501','1.0359','-0.17041','0.44013','-1.7879', '0.61577','0.52878','0.32978','-0.82872','0.48385','0.76497', '-0.64303','0.18897','0.3698','0.62647','1.7118','-0.2942', '-0.26316','-0.35169','-0.72771','-0.71678','0.91815','-0.56122', '0.51562','-0.030861','-0.017585','-0.58224','-0.98393', '0.85906','-0.67031','0.34382','-0.41876','-0.40575','-0.53006', '-0.20514'])]
我该如何解决?虽然花了很多时间试图修复它,但这似乎很简单,但我不知道。
答案 0 :(得分:0)
当您尝试从具有不同大小的单个数组的列表中创建 NumPy 数组时,就会发生这种情况。我在读取 GloVe 文件时遇到了这个问题,就像你一样,通过空格字符手动分割每一行。
如果我们的 glove.twitter.27B.50d.txt 是同一个文件,第 38523 行包含
0.065581 0.39605 -0.96669 0.23706 -0.41379 -0.97006 0.16601 -1.292 -0.58989 0.11632 -1.365 -0.27939 -0.57222 -0.97108 -0.56319 -0.015263 -0.70465 -0.13867 1.0702 -0.25557 0.25122 -0.87755 0.70999 0.9118 -0.30077
词汇对我来说似乎是一个不可打印的字符。这将导致代码将第一个嵌入向量读取为词汇,并且在该特定行获得的嵌入向量数量较少(如您的情况为 49 维)。
在 glove.twitter.27B.25d.txt、glove.twitter.27B.100d.txt 和 glove.twitter.27B.200d.txt 中也可以找到完全相同的行
有效的快速而肮脏的解决方案是:
for line in file.readlines():
row = line.strip().split(' ')
if len(row)-1 < embedding_dim:
row.insert(0, '')
vocab.append(row[0])
embd.append(row[1:])