Java:在Google-diff-match-patch中获得有效句子

时间:2015-03-04 15:22:09

标签: java regex string string-comparison google-diff-match-patch

  • 我正在研究一个Java应用程序,我想在其中进行比较2 段落并获得2个字符串中不同的句子 待比较。现在我能够得到插入的内容和内容 删除。我面临的问题是,我想得到哪个句子 受影响的不仅仅是文字。

示例:

  1. Old String:快速的棕色狐狸跳过懒兔。好奇心杀死了这只猫。
  2. 新串:快速的棕色狮子跳过懒兔。好奇心杀死了这只猫。
  3. 预期产量:快速的棕色狮子跳过懒兔。

    我现在得到的是什么。

    Diff(DELETE,"fox")
    Diff(INSERT,"lion")
    

    所以,我没有关于狐狸被删除的地方以及添加狮子的地方的背景。因此,即使有一些操作的左右15个字符也可以。 我现在的代码:

    diff_match_patch diffMatchPatch = new diff_match_patch();
                LinkedList<diff_match_patch.Diff> deltas = diffMatchPatch.diff_main(oldText,newText);
                for(diff_match_patch.Diff d : deltas){
                    if((d.operation == diff_match_patch.Operation.DELETE) || (d.operation== diff_match_patch.Operation.INSERT)) {
                        System.out.println(d);
                    }
                }
    

    任何帮助都会很好。非常感谢。 :-)如果对我解释的方式有任何疑问,请告诉我。

    修改的 从答案中添加了新代码:

     diff_match_patch diffMatchPatch = new diff_match_patch();
                LinkedList<diff_match_patch.Diff> deltas = diffMatchPatch.diff_main(notes1.getNotetext(),notes.getNotetext());
                for(diff_match_patch.Diff d : deltas) {
                    if ((d.operation == diff_match_patch.Operation.DELETE) || (d.operation == diff_match_patch.Operation.INSERT)) {
                        Pattern myPattern = Pattern.compile("(\\. |^)(.*" + d.text + ".*)(\\. )");
                        Matcher m = myPattern.matcher(notes1.getNotetext());
                        while (m.find()) {
                            System.out.println("Found " + d.operation + " of: " + d.text + " in sentence: " + m.group());
                        }
                    }
                }
    
    The output I am getting is wrong, something like this I am getting,
    Found DELETE of: I  in sentence: I yoyo am also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
    Found DELETE of: oyo am in sentence: I yoyo am also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
    Found DELETE of: a in sentence: akshay also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
    Found INSERT of: a in sentence: kshay also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
    Found INSERT of: r in sentence: akshay also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
    Found DELETE of: ks in sentence: akshay also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
    Found DELETE of: ay in sentence: akshay also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
    Found INSERT of: ul in sentence: akshay also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
    Found DELETE of: In this, in sentence: rahul also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
    Found DELETE of: ang in sentence: rahul also working on a webapp in which the user can make changes to a text area.  he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
    Found DELETE of: s in sentence: rahul also working on a webapp in which the user can make changes to a text area.  he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
    Found INSERT of: ck in sentence: rahul also working on a webapp in which the user can make changes to a text area.  he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
    Found DELETE of: rahul  in sentence: rahul also working on a webapp in which the user can make check to a text area.  he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
    

    我想知道什么时候删除整个单词/句子,所以我可以在数据库中正确保存它。你能帮忙的话,我会很高兴。非常感谢。 : - )

    修改的 下面提到的答案非常适合获得2个单独的字符串,可以在数据库中保留。

1 个答案:

答案 0 :(得分:3)

经过广泛的重新考虑后,我认为这不是正则表达式的情况。同样的变化我出现在几行中,所以你必须逐行检查你的输入:

//-------------------------Example Strings---------------------------------------------
  private static String oldText = "I yoyo am also working on a \n webapp in which the user can make changes to a text area. " +
      "In this, he can either write one paragraph, one sentence." +
      " So what I am currently trying to do is to split the whole paragraph by a dot separator. " +
      "Once that is done, I would like to check which sentences have changed." +
      " I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays." +
      " But it is not working, I am getting zero String modified from it."+
      " Kindly let me know what I am doing wrong.";

  private static String newText = "akshay is also working on a \n webapp in which the user can make changes to a text area. " +
      "He can either write one paragraph, one sentence." +
      " So what I am currently trying to do is to split the whole paragraph by a dot separator. " +
      "Once that is done, I would like to check which sentences have changed." +
      " I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays." +
      " But it is not working, I am getting zero String modified from it.";
  //-------------------------Example Strings end --------------------------------------

  private static diff_match_patch diffMatchPatch;

  public static void main(String[] args) {

    diffMatchPatch = new diff_match_patch();
    //Split text into List of strings
    List<String> oldTextList = Arrays.asList(oldText.split("(\\.|\\n)"));
    List<String> newTextList = Arrays.asList(newText.split("(\\.|\\n)"));

    //If we have different length
    int counter = Math.max(oldTextList.size(), newTextList.size()); 
    StringBuilder sb = new StringBuilder();

    for(int current = 0; current < counter; current++){
      String oldString = null;
      String newString = null;

      if(oldTextList.size() <= current){
        oldString = "";
        newString = newTextList.get(current);

      } else if (newTextList.size() <= current){
        oldString = oldTextList.get(current);
        newString = "";
      } else {
        if (isLineDifferent(oldTextList.get(current), newTextList.get(current))){
          oldString = oldTextList.get(current);
          newString = newTextList.get(current);
        }
      }
      if(oldString != null && newString != null) {
        //---- Insert into database here -----
        sb.append("Changes for Line: " + current + "\n");
        sb.append("Old: " + oldString + "; New: " + newString +";\n");
      }
    }

    System.out.println(sb.toString());
  }

  private static boolean isLineDifferent(String oldString, String newString) {
    LinkedList<diff_match_patch.Diff> deltas = diffMatchPatch.diff_main(oldString,newString);
    for(diff_match_patch.Diff d : deltas){
      if (d.operation == diff_match_patch.Operation.EQUAL) continue;
      return true;
      }
    return false;
    }
  }

这应该会给你带来以下结果:

Changes for Line: 0
Old: I yoyo am also working on a ; New: akshay is also working on a ;
Changes for Line: 2
Old:  In this, he can either write one paragraph, one sentence; New:  He can either write one paragraph, one sentence;
Changes for Line: 8
Old:  Kindly let me know what I am doing wrong; New: ;

请注意,我只添加了&#34 ;;&#34;作为Stringbuilder的分离符号,以便您可以辨别字符串的结束位置。当然,这仍然不是一件值得考虑的事情:

  • 此代码会拆分文本中的每个点(&#39;。&#39;)。如果你的某个点不是一个句子的结尾,你的结果就会有所偏差。
  • 与其他所有差异工具一样,行顺序中的开关被注册为一系列删除和插入
  • 如果你逐行阅读文字,你应该只是在它们来的时候喂它们而不是全部收集它们然后再将它们分开。 (看this example
  • 正如您在上面链接中的示例中所看到的,存在一个我认为更适合您的用例的新库。