使用diff-match-patch库的奇怪字符

时间:2010-06-23 14:44:30

标签: java diff

我正在使用diff-match-patch(https://code.google.com/archive/p/google-diff-match-patch/)来区分两个文本。在差异结束时,他们返回奇怪的字符:例如à变为%C3%A0ù %C3%B9" %22等等

这是我的代码示例:

String startDocument = "hello world";
String endDocument = "àèìòù\"";
diff_match_patch dmp = new diff_match_patch();
dmp.Diff_Timeout = 16;
LinkedList<Diff> diffs = dmp.diff_main( startDocument, endDocument );
String diff = dmp.diff_toDelta(diffs);
System.out.println(diff);      //return: -11 +%C3%A0%C3%A8%C3%AC%C3%B2%C3%B9%22

如何检索原始字符?

2 个答案:

答案 0 :(得分:0)

尝试

javac -encoding utf8 DaClass.java

java -Dfile.encoding=utf8 DaClass

答案 1 :(得分:0)

这是预期的行为。

DiffMatchPatch使用类似javascript的特殊字符编码(来自the project's wiki):

  

2。编码字符

     

使用%xx表示法编码特殊字符。这套   编码的字符与JavaScript的encodeURI()相匹配   函数,但未编码的空格除外。

要解码它,只需查看the code

// decode would change all "+" to " "
param = param.replace("+", "%2B");
try {
    param = URLDecoder.decode(param, "UTF-8");
} catch (UnsupportedEncodingException e) {
    // Not likely on modern system.
    throw new Error("This system does not support UTF-8.", e);
} catch (IllegalArgumentException e) {
    // Malformed URI sequence.
    throw new IllegalArgumentException("Illegal escape in diff_fromDelta: " + param, e);
}
diffs.add(new Diff(Operation.INSERT, param));
实际上,您不需要解码平等或删除,因为它不会以delta格式包含其文本。

如果您只想尝试显示差异,请查看diff_prettyHtml方法。