Question

我目前想知道如何有效实施LCS问题。我找到了一种通过比较和移位来找到具有张量操作的连续匹配（即ngrams匹配）的方法。使用两个序列x (len: n)，y (len: m)，矩阵：

e = x.eq(y.unsqueeze(1)) # [n x m]

我们有：e[i, j] == 1 <=> x[i] == y[j]，n-gram匹配将作为1的对角线可见。因此，我们可以做到以下几点：

# match_1 = [n x m] of {0, 1}
match_1 = x.eq(y.unsqueeze(1))

# match_2 = [(n-1) x (m-1)] matrix of {0, 1}
match_2 = match_1[:-1, :-1] * match_1[1:, 1:]

# etcetc

由于我们允许存在差距，因此LCS问题更为复杂。它可以使用动态编程实现，在O（n x m）中，但对于Pytorch来说它不是真正可行吗？我试过，这太慢了。

# considering two "sentences" (LongTensors of word indices)
# _ts, _tr of respective length n and m
table = torch.zeros(n+1, m+1)
_ts, _tr = ts, tr
for i in range(1, n+1):
    for j in range(1, m+1):
        if _ts[i-1] == _tr[j-1]:
            _table[i, j] = _table[i-1, j-1] + 1 
        else:
            _table[i, j] = max(_table[i-1, j], _table[i, j-1])
 lcs = _table[n][m]

有什么想法让它更有效率？

具有PyTorch的动态编程：最长公共子序列

0 个答案: