Question

tensorflow tf.edit_distance函数如何工作？如何比较存储在2d或3d密集矩阵的两个不同稀疏矩阵中的字符串。

在tensorflow网页https://www.tensorflow.org/api_docs/python/tf/edit_distance上给出的示例不是很明显。请使用其他示例提供说明。

这个例子还不清楚。

#'hypothesis' is a tensor of shape [2, 1] with variable-length values:
#(0,0) = ["a"] and (1,0) = ["b"]

hypothesis = tf.SparseTensor([[0, 0, 0],[1, 0, 0]],["a", "b"],(2, 1, 1))

#'truth' is a tensor of shape `[2, 2]` with variable-length values:
#(0,0) = [], (0,1) = ["a"], (1,0) = ["b", "c"],(1,1) = ["a"]

truth = tf.SparseTensor([[0, 1, 0],[1, 0, 0],[1, 0, 1],[1, 1, 0]],["a", "b", 
"c", "a"],(2, 2, 2))

normalize = True

#'output' is a tensor of shape [2, 2] with edit distances normalized by 
#'truth' lengths.

output ==> [[inf, 1.0],[0.5, 1.0]],

(0,0): no truth, (0,1): no hypothesis, (1,0): addition, (1,1): no hypothesis

尺寸[2,2]的输出如何？

这里的标准化是什么？

Answer 1

稠密的假设看起来像这样

[[['a']],
 [['b']]] # (2, 1, 1)

真相是这个

[[[],['a']],
 [['b', 'c'], ['a']]] # (2, 2, 2)

我们正在尝试在假设和真值之间找到Levenshtein distance。所以，这是正在发生的事情：

在（0,0,0）-假设中的['a']与[]有多远-该位置没有真相，因此无法计算距离

在（0,0,1）处的

-因为假设在该位置没有任何东西，所以我们返回1。与上面的情况不同，距离为1，因为理论上可以通过插入一个字符使假设与真相相同（请参阅Levenshtein距离计算）

在（1,0,0）处的

-实际的['b']与['b'，'c']的hyp距离有多远。这又是1，因为我们可以插入一个字符以使炒作与真相相同。但是，我们选择标准化输出距离。因此，我们将真值段的长度除以2，即得到0.5

at（1,0,1）-[]与['a']的hyp距离有多远，因为hyp在该位置没有任何内容，因此我们返回1

输出为（2,2），因为hyp的等级为真，真值为3。该函数返回具有等级（rank-1）的张量

通过想象我们在这里试图做的事情会有所帮助。假设中有2个序列，事实中有2个序列。因此，您的输出分数将使您获得每个序列中每个位置的分数。

在此示例中，我们尝试将4个假设与真值匹配。我认为您必须针对您在评论中描述的用例的每个真相序列执行此操作-如果您发现更有效的方法，请告诉我：-）

import tensorflow as tf

hypothesis = tf.SparseTensor(
            [[0, 0, 0],
             [1, 0, 0],
             [2, 0, 0],
             [3, 0, 0]],
             ["a", "b", "c", "d"],
            (4, 1, 1))

truth = tf.SparseTensor([[0, 0, 0], [0, 0, 1], [0, 1, 0]], ["b", "c", "a"], (1,2,2))
num_hyp = 4
truth = tf.sparse_concat(0, [truth] * num_hyp)

d = tf.edit_distance(hypothesis, truth)

with tf.Session() as sess:
    print(sess.run(d))

输出：

[[1.  1. ]
 [0.5 1. ]
 [0.5 1. ]
 [1.  1. ]]

tensorflow tf.edit_distance需要解释吗？

1 个答案: