我尝试如下计算两个句子之间的差异:
import difflib
text1_lines = "I understand how customers do their choice. Difference"
text2_lines = "I understand how customers do their choice."
diff = difflib.ndiff(text1_lines, text2_lines)
我想有所作为
但是我不明白。我究竟做错了什么 ?感谢您让我知道。
答案 0 :(得分:2)
来自Docs:
import difflib
import sys
text1_lines = "I understand how customers do their choice. Difference"
text2_lines = "I understand how customers do their choice."
diff = difflib.context_diff(text1_lines, text2_lines)
for line in diff:
sys.stdout.write(line)
输出:
***
---
***************
*** 41,54 ****
c e .- - D- i- f- f- e- r- e- n- c- e--- 41,43 ----
答案 1 :(得分:1)
将较大的字符串与较小的字符串分开,您会得到不同。
if len(a) == 0:
print b
return
if len(b) == 0:
print a
return
if len(a)>len(b):
res=''.join(a.split(b)) #get diff
else:
res=''.join(b.split(a)) #get diff
print(res.strip())
答案 2 :(得分:1)
使用简单的列表理解:
diff = [x for x in difflib.ndiff(text1_lines, text2_lines) if x[0] != ' ']
它将显示删除和附录
输出:
['- ', '- D', '- i', '- f', '- f', '- e', '- r', '- e', '- n', '- c', '- e']
(后面带有减号的所有内容均被删除)
相反,切换text1_lines
和text2_lines
会产生以下结果:
['+ ', '+ D', '+ i', '+ f', '+ f', '+ e', '+ r', '+ e', '+ n', '+ c', '+ e']
要删除符号,可以转换上面的列表:
diff_nl = [x[2] for x in diff]
要完全转换为字符串,只需使用.join()
:
diff_nl = ''.join([x[2] for x in diff])
答案 3 :(得分:0)
使用实际的difflib
,这就是您要这样做的方式。问题在于您正在获得一个生成器,该生成器有点像打包的for循环,而对其进行解压缩的唯一方法是对其进行迭代。
import difflib
text1_lines = "I understand how customers do their choice. Difference"
text2_lines = "I understand how customers do their choice."
diff = difflib.unified_diff(text1_lines, text2_lines)
unified_diff
与ndiff
的不同之处在于,它仅显示不同之处,而ndiff
则显示了相似之处和不同之处。 diff
现在是一个生成器对象,剩下要做的就是解压缩它
n = 0
result = ''
for difference in diff:
n += 1
if n < 7: # the first 7 lines is a bunch of information unnecessary for waht you want
continue
result += difference[1] # the character at this point will either be " x", "-x" or "+x"
最后:
>>> result
' Difference'