Question

我认为这是找到字符串中重复单词的答案。但是当我使用它时，它认为This和is是相同的并删除了is。

正则表达式

"\\b(\\w+)\\b\\s+\\1"

知道为什么会这样吗？

以下是我用于重复删除的代码

public static String RemoveDuplicateWords(String input)
{
    String originalText = input;
    String output = "";
    Pattern p = Pattern.compile("\b(\w+)\b\s+\b\1\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); 
    //Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\1", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(input);
    if (!m.find())
        output = "No duplicates found, no changes made to data";
    else
    {
        while (m.find())
        {
            if (output == "")
                output = input.replaceFirst(m.group(), m.group(1));
            else
                output = output.replaceAll(m.group(), m.group(1));
        }
        input = output;
        m = p.matcher(input);
        while (m.find())
        {
            output = "";
            if (output == "")
                output = input.replaceAll(m.group(), m.group(1));
            else
                output = output.replaceAll(m.group(), m.group(1));
        }
    }
    return output;
}

Answer 1

试试这个：

String pattern = "(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+";
Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);

String input = "your string";
Matcher m = r.matcher(input);
while (m.find()) {
    input = input.replaceAll(m.group(), m.group(1));
}
System.out.println(input);

在API documentation of the Pattern class中很好地解释了Java正则表达式。添加一些空格以指示正则表达式的不同部分后：

"(?i) \\b ([a-z]+) \\b (?: \\s+ \\1 \\b )+"

\b       match a word boundary
[a-z]+   match a word with one or more characters;
         the parentheses capture the word as a group    
\b       match a word boundary
(?:      indicates a non-capturing group (which starts here)
\s+      match one or more white space characters
\1       is a back reference to the first (captured) group;
         so the word is repeated here
\b       match a word boundary
)+       indicates the end of the non-capturing group and
         allows it to occur one or more times

Answer 2

您应该使用\b(\w+)\b\s+\b\1\b，点击here查看结果...

希望这就是你想要的......

更新1

很好，你拥有的输出是

删除重复项后的最后一个字符串

import java.util.regex.*;

public class MyDup {
    public static void main (String args[]) {
    String input="This This is text text another another";
    String originalText = input;
    String output = "";
    Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\b\\1\\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(input);
    System.out.println(m);
    if (!m.find())
        output = "No duplicates found, no changes made to data";
    else
    {
        while (m.find())
        {
            if (output == "") {
                output = input.replaceFirst(m.group(), m.group(1));
            } else {
                output = output.replaceAll(m.group(), m.group(1));
            }
        }
        input = output;
        m = p.matcher(input);
        while (m.find())
        {
            output = "";
            if (output == "") {
                output = input.replaceAll(m.group(), m.group(1));
            } else {
                output = output.replaceAll(m.group(), m.group(1));
            }
        }
    }
    System.out.println("After removing duplicate the final string is " + output);
}

运行此代码并查看您获得的输出...您的查询将被解决...

注意

在output中，你用一个单词替换副本......不是吗??

当我将System.out.println(m.group() + " : " + m.group(1));置于第一个条件时，我将输出为text text : text，即重复项被替换为单个词。

else
    {
        while (m.find())
        {
            if (output == "") {
                System.out.println(m.group() + " : " + m.group(1));
                output = input.replaceFirst(m.group(), m.group(1));
            } else {

希望你现在得到了什么...... :)

祝你好运！干杯!!!

Answer 3

以下模式将匹配重复的单词，即使出现的次数也是如此。

Pattern.compile("\\b(\\w+)(\\b\\W+\\b\\1\\b)*", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);

对于e-g，“这是我的朋友朋友朋友” 将输出“这是我的朋友”

此外，对于此模式，只有一次使用“while（m.find（））”的迭代就足够了。

Answer 4

Keys.podspec

<强>解释

\b(\w+)(\b\W+\1\b)*

选择所有单词后，现在可以选择常用单词了。

\b : Any word boundary <br/>(\w+) : Select any word character (letter, number, underscore)

参考：Example

Answer 5

如果unicode重要，则应使用此代码：

 Pattern.compile("\\b(\\w+)(\\b\\W+\\b\\1\\b)*",
        Pattern.MULTILINE + Pattern.CASE_INSENSITIVE + Pattern.UNICODE_CHARACTER_CLASS)

Answer 6

我相信这是你应该用来检测由任意数量的非单词字符分隔的2个连续单词的正则表达式：

Pattern p = Pattern.compile("\\b(\\w+)\\b\\W+\\b\\1\\b", Pattern.CASE_INSENSITIVE);

java中的正则表达式，用于查找重复的连续单词

6 个答案:

更新1

删除重复项后的最后一个字符串

注意

希望你现在得到了什么...... :)

祝你好运！干杯!!!