如果生成两个old.txt
和new.txt
个文件,例如通过命令行
echo "This is not what I want." > old.txt
echo "This is what I want!" > new.txt
我可以运行wdiff
来生成字差异文件:
wdiff old.txt new.txt > diff.txt
并使用cat diff.txt
阅读会给我:
This is [-not-] what I [-want.-] {+want!+}
仅从diff.txt
开始并解析它,如何恢复"原始" old.txt
和new.txt
内容?
(原则上总是可能因为wdiff
似乎保留所有" old"和&的文本信息#34;编辑"文本文件,参见例如this gist的另一个例子)
一种选择是使用正则表达式构建一个简单的(例如Python)解析器:
import re
def get_edited(diff):
diff = re.sub('\[\-(.*?)\-\]', '', diff)
edited = re.sub('\{\+(.*?)\+\}', '\\1', diff)
return edited
def get_original(diff):
diff = re.sub('\[\-(.*?)\-\]', '\\1', diff)
original = re.sub('\{\+(.*?)\+\}', '', diff)
return original
但是如果有一种内置的方法可以做到这一点会很好。有什么建议吗?
答案 0 :(得分:0)
正则表达式似乎是要走的路 有关GitHub的示例,请参阅https://github.com/snukky/wikiedits/blob/master/bin/wdiff_to_parallel.py。