Question

我尝试如下计算两个句子之间的差异：

import difflib

text1_lines = "I understand how customers do their choice. Difference"
text2_lines = "I understand how customers do their choice."
diff = difflib.ndiff(text1_lines, text2_lines)

我想有所作为

但是我不明白。我究竟做错了什么？感谢您让我知道。

Answer 1

来自Docs：

import difflib
import sys

text1_lines = "I understand how customers do their choice. Difference"
text2_lines = "I understand how customers do their choice."
diff = difflib.context_diff(text1_lines, text2_lines)
for line in diff:
    sys.stdout.write(line)

输出：

*** 
--- 
***************
*** 41,54 ****
c  e  .-  - D- i- f- f- e- r- e- n- c- e--- 41,43 ----

Answer 2

将较大的字符串与较小的字符串分开，您会得到不同。

if len(a) == 0:
   print b
   return
if len(b) == 0:
   print a
   return
if len(a)>len(b): 
   res=''.join(a.split(b))             #get diff
else: 
   res=''.join(b.split(a))             #get diff

print(res.strip())

Answer 3

使用简单的列表理解：

diff = [x for x in difflib.ndiff(text1_lines, text2_lines) if x[0] != ' ']

它将显示删除和附录

输出：

['-  ', '- D', '- i', '- f', '- f', '- e', '- r', '- e', '- n', '- c', '- e']

（后面带有减号的所有内容均被删除）

相反，切换text1_lines和text2_lines会产生以下结果：

['+  ', '+ D', '+ i', '+ f', '+ f', '+ e', '+ r', '+ e', '+ n', '+ c', '+ e']

要删除符号，可以转换上面的列表：

diff_nl = [x[2] for x in diff]

要完全转换为字符串，只需使用.join()：

diff_nl = ''.join([x[2] for x in diff])

Answer 4

使用实际的difflib，这就是您要这样做的方式。问题在于您正在获得一个生成器，该生成器有点像打包的for循环，而对其进行解压缩的唯一方法是对其进行迭代。

import difflib
text1_lines = "I understand how customers do their choice. Difference"
text2_lines = "I understand how customers do their choice."
diff = difflib.unified_diff(text1_lines, text2_lines)

unified_diff与ndiff的不同之处在于，它仅显示不同之处，而ndiff则显示了相似之处和不同之处。 diff现在是一个生成器对象，剩下要做的就是解压缩它

n = 0
result = ''
for difference in diff:
    n += 1
    if n < 7: # the first 7 lines is a bunch of information unnecessary for waht you want
        continue
    result += difference[1] # the character at this point will either be " x", "-x" or "+x"

最后：

>>> result
' Difference'

两个字符串之间的差异（句子）

4 个答案: