我想在字符级别比较两个段落,看看哪些字词被修改。
要比较的段落:
t1 = '''1 Then was Jesus led up of the Spirit into the wilderness to be tempted of the devil.
2 And when he had fasted forty days and forty nights, he was afterward an hungred.
3 And when the tempter came to him, he said, If thou be the Son of God, command that these stones be made bread.
'''.splitlines(keepends=True)
t2 = '''1 Then Jesus was led up of the Spirit, into the wilderness, to be with God.
2 And when he had fasted forty days and forty nights, and had communed with God, he was afterwards an hungered, and was left to be tempted of the devil,
3 And when the tempter came to him, he said, If thou be the Son of God, command that these stones be made bread.
'''.splitlines(keepends=True)
当我尝试difflib
时,它在第一行中效果很好,但它没有检测到第二行的差异。
>>> from difflib import *
>>> d = Differ()
>>> result = list(d.compare(t1,t2))
>>> for i in result:
... print(i, end='')
- 1 Then was Jesus led up of the Spirit into the wilderness to be tempted of the devil.
? ---- ^^^^^^^^^^^ - ----
+ 1 Then Jesus was led up of the Spirit, into the wilderness, to be with God.
? ++++ + + ^^ ++
- 2 And when he had fasted forty days and forty nights, he was afterward an hungred.
+ 2 And when he had fasted forty days and forty nights, and had communed with God, he was afterwards an hungered, and was left to be tempted of the devil,
3 And when the tempter came to him, he said, If thou be the Son of God, command that these stones be made bread.
只有第一段具有所需的输出。
即使我提取第二行进行比较
t1 = '''2 And when he had fasted forty days and forty nights, he was afterward an hungred.
'''.splitlines(keepends=True)
t2 = '''2 And when he had fasted forty days and forty nights, and had communed with God, he was afterwards an hungered, and was left to be tempted of the devil,
'''.splitlines(keepends=True)
d = Differ()
result = list(d.compare(t1,t2))
for i in result:
print(i, end='')
它没有显示正在修改哪个字符,它表明正在修改此行。
- 2 And when he had fasted forty days and forty nights, he was afterward an hungred.
+ 2 And when he had fasted forty days and forty nights, and had communed with God, he was afterwards an hungered, and was left to be
tempted of the devil,
但如果我用SequenceMatcher
测试来比较第二行,它似乎可以识别修改过的字符。
p2_1 = '''2 And when he had fasted forty days and forty nights, he was afterward an hungred.'''
p2_2 = '''2 And when he had fasted forty days and forty nights, and had communed with God, he was afterwards an hungered, and was left to be tempted of the devil,'''
se = SequenceMatcher(None,p2_1, p2_2)
se.get_opcodes()
[('equal', 0, 54, 0, 54),
('insert', 54, 54, 54, 81),
('equal', 54, 70, 81, 97),
('insert', 70, 70, 97, 98),
('equal', 70, 78, 98, 106),
('insert', 78, 78, 106, 107),
('equal', 78, 81, 107, 110),
('replace', 81, 82, 110, 152)]
我如何比较这两段,我可以知道哪个字符被修改?或者我可以使用现有的包吗?
- 1 Then was Jesus led up of the Spirit into the wilderness to be tempted of the devil.
? ---- ^^^^^^^^^^^ - ----
+ 1 Then Jesus was led up of the Spirit, into the wilderness, to be with God.
? ++++ + + ^^ ++