两个字符串之间的差异(句子)

时间:2019-05-16 05:42:48

标签: python string

我尝试如下计算两个句子之间的差异:

import difflib

text1_lines = "I understand how customers do their choice. Difference"
text2_lines = "I understand how customers do their choice."
diff = difflib.ndiff(text1_lines, text2_lines)

我想有所作为

但是我不明白。我究竟做错了什么 ?感谢您让我知道。

4 个答案:

答案 0 :(得分:2)

来自Docs

import difflib
import sys

text1_lines = "I understand how customers do their choice. Difference"
text2_lines = "I understand how customers do their choice."
diff = difflib.context_diff(text1_lines, text2_lines)
for line in diff:
    sys.stdout.write(line)

输出:

*** 
--- 
***************
*** 41,54 ****
c  e  .-  - D- i- f- f- e- r- e- n- c- e--- 41,43 ----

答案 1 :(得分:1)

将较大的字符串与较小的字符串分开,您会得到不同。

if len(a) == 0:
   print b
   return
if len(b) == 0:
   print a
   return
if len(a)>len(b): 
   res=''.join(a.split(b))             #get diff
else: 
   res=''.join(b.split(a))             #get diff

print(res.strip())     

答案 2 :(得分:1)

使用简单的列表理解:

diff = [x for x in difflib.ndiff(text1_lines, text2_lines) if x[0] != ' ']

它将显示删除和附录

输出:

['-  ', '- D', '- i', '- f', '- f', '- e', '- r', '- e', '- n', '- c', '- e']

(后面带有减号的所有内容均被删除)

相反,切换text1_linestext2_lines会产生以下结果:

['+  ', '+ D', '+ i', '+ f', '+ f', '+ e', '+ r', '+ e', '+ n', '+ c', '+ e']

要删除符号,可以转换上面的列表:

diff_nl = [x[2] for x in diff]

要完全转换为字符串,只需使用.join()

diff_nl = ''.join([x[2] for x in diff])

答案 3 :(得分:0)

使用实际的difflib,这就是您要这样做的方式。问题在于您正在获得一个生成器,该生成器有点像打包的for循环,而对其进行解压缩的唯一方法是对其进行迭代。

import difflib
text1_lines = "I understand how customers do their choice. Difference"
text2_lines = "I understand how customers do their choice."
diff = difflib.unified_diff(text1_lines, text2_lines)

unified_diffndiff的不同之处在于,它仅显示不同之处,而ndiff则显示了相似之处和不同之处。 diff现在是一个生成器对象,剩下要做的就是解压缩它

n = 0
result = ''
for difference in diff:
    n += 1
    if n < 7: # the first 7 lines is a bunch of information unnecessary for waht you want
        continue
    result += difference[1] # the character at this point will either be " x", "-x" or "+x"

最后:

>>> result
' Difference'