Question

我想计算两个大尺寸向量之间的向量差。两者都是数据框的列。我能够将一个数组列表减去一个数组并将其置于2的幂

(train["quest_emb"][0] - train["sent_emb"][0])**2

但不能将其从数组列表的dataframe列推广到arrays的数据框：

train["quest_emb"] - train["sent_emb"]

它冻结了我的计算机。

数组分析

以下是其内容的示例。

>>> print((train["quest_emb"][2]))
[[0.03949683 0.04509903 0.01808935 ... 0.04610749 0.0416535  0.02240689]]

>>> print((train["sent_emb"][2]))
[array([0.03037658, 0.04433101, 0.08135635, ..., 0.06764812, 0.04971079,
       0.02240689], dtype=float32), array([0.05260669, 0.04548098, 0.0382337 , ..., 0.04823414, 0.07656007,
       0.03501297], dtype=float32), array([0.0502927 , 0.04480611, 0.02038252, ..., 0.03942193, 0.03132772,
       0.04595207], dtype=float32), array([0.06769167, 0.03393815, 0.0625218 , ..., 0.05555448, 0.03059104,
       0.03422254], dtype=float32)]

似乎大小有所不同：

>>> print(len(train["quest_emb"][0]))
1
>>> print(len(train["sent_emb"][0]))
4

这里是它们的类型，不同，但是当减去另一条线时似乎没什么问题：

>>> print((train["quest_emb"][2][0]))
[0.03949683 0.04509903 0.01808935 ... 0.04610749 0.0416535  0.02240689]

>>> print((train["sent_emb"][2][0]))
[0.03037658 0.04433101 0.08135635 ... 0.06764812 0.04971079 0.02240689]

train["quest_emb"]的长度与train["sent_emb"]相同：130318

这是数组的类型

>>> print(type(train["quest_emb"][2]))
<class 'numpy.ndarray'>

>>> print(type(train["sent_emb"][2]))
<class 'list'>

是否有任何方法可以使具有8G RAM的计算机计算出这种差异？或者不是一种近似的方式？

尝试Theano

import theano.tensor as T
from theano import function
x = T.dscalar('x')
y = T.dscalar('y')
z = x - y
f = function([x, y], z)   
f(train["quest_emb"],train["sent_emb"])

但这给了我

ValueError: Bad input argument with name "quest_emb" to theano function with name "<ipython-input-41-c53eb459cbc4>:6" at index 0 (0-based).

就我可以计算出另一行而言，我也在考虑迭代进行矢量减法，但是我不知道每次减法后如何在数据帧中添加新行。

如何将一个大向量相减？

数组分析

尝试Theano

0 个答案: