正则表达式从匹配中排除单词

时间:2017-07-09 13:12:00

标签: java regex

有谁知道我做错了什么? 我有这句话:

hi [user=1234]John Jack[/user] take me home

我需要正则表达式,只选择 John Jack

我的正则表达式:

(\[user=\d\d\d\d](.+?)\[\/user\])(?!(\[user=\d\d\d\d\])|(\[\/user\]))

我想要排除[user=1234][/user]

(\[user=\d\d\d\d](.+?)\[\/user\])选择[user=1234]John Jack[/user],但我只想要John Jack

完整示例:

  

hi [user = 1234] John Jack [/ user]带我回家。 [user = 12] Jonno Ha   [\ user]你在哪里[differentTag] hm? [/ differentTag]。彼得伊姆   这里有[user = 1] Danny Di [\ user]

5 个答案:

答案 0 :(得分:4)

替代@matoni的回答" lookahead" " lookbehind"语法,您可以使用分组(已在模式中定义)并提取适当的组:

    String s = "hi [user=1234]John Jack[/user] take me home ...";
    Pattern p = Pattern.compile("\\[user=\\d+\\](.+)\\[/user\\]");
    Matcher m = p.matcher(s);
    while (m.find()) {
        System.out.println(m.group(1));
    }

答案 1 :(得分:3)

(.+?)组被编入索引为2,它应该保留John Jack,因此您应该可以通过matcher.group(2)获取它。

演示:

String text = "hi [user=1234]John Jack[/user] take me home. [user=12] Jonno Ha [\\user] where you are [differentTag] hm? [/differentTag]. Peter Im here with [user=1]Danny Di [\\user]";
Pattern p = Pattern.compile("(\\[user=\\d\\d\\d\\d](.+?)\\[\\/user\\])(?!(\\[user=\\d\\d\\d\\d\\])|(\\[\\/user\\]))");
Matcher m = p.matcher(text);
if(m.find()){
    System.out.println(m.group(2));
}

输出:John Jack

如果您想找到更多用户,您需要将if更改为while并修复正则表达式,因为

  • 目前您正在搜索具有4位数ID的用户,因此无法匹配[user=12][user=1]。因此,您可以使用\d\d\d\d代替\d+
  • 您使用的是[user=ID]..[/user],还有[user=ID]..[\user]/\]。

BTW,因为Java不使用/regex/flags语法,/不被视为特殊字符,因此您无需转义它。

此外,我不确定为什么在正则表达式结束时需要(?!(\\[user=\\d\\d\\d\\d\\])|(\\[\\/user\\])),它在您展示的示例中并没有真正做任何事情,所以看起来它可以被移除。此外,我们不需要用括号括住前面的部分,因为前瞻不会向已经放在组0中的整个匹配添加任何内容,因此我们不需要单独的组来复制该匹配。删除后,那些额外的括号(.+?)将被编入索引为第1组。

修改后的简化解决方案如下:

String text = "hi [user=1234]John Jack[/user] take me home. [user=12] Jonno Ha [\\user] where you are [differentTag] hm? [/differentTag]. Peter Im here with [user=1]Danny Di [\\user]";
Pattern p = Pattern.compile("\\[user=\\d+](.+?)\\[(/|\\\\)user]");
Matcher m = p.matcher(text);
while(m.find()){
    System.out.println(m.group(1).trim()); 
}

输出:

John Jack
Jonno Ha
Danny Di

答案 2 :(得分:2)

试试这个:

String s = "hi [user=1234]John Jack[/user] take me home";
// assuming user id has always 4 decimals
Pattern p = Pattern.compile("(?<=\\[user=\\d{4}\\]).+(?=\\[/user\\])");
Matcher m = p.matcher(s);
m.find();
System.out.println(s.substring(m.start(), m.end()));

注意,您不能使用像(?<=.+)这样的可变长度的“lookbehind”模式。因此,如果您知道,该用户ID最多为例如11个地方,然后你可以使用:

Pattern.compile("(?<=\\[user=\\d{4,11}\\]).+(?=\\[/user\\])");

有关正则表达式的详细信息,请参阅:Pattern javadoc

答案 3 :(得分:2)

完全解码:

public class RegExpPattern_002 {

   public static void main( String[] args ) {
      final String text =
         "hi [user=1234]John Jack[/user] take me home."
         + " [user=12] Jonno Ha [/user]"
         + " where you are [differentTag] hm? [/differentTag]."
         + " Peter Im here with [user=1]Danny Di [/user]";
      final Pattern p = Pattern.compile(
         "([^\\[]*)\\[(\\w+)(=([^\\]]+))?\\]([^\\[]*)\\[/(\\w+)\\]" );
      final Matcher m = p.matcher( text );
      while( m.find()) {
         final String preText   = m.group( 1 );
         final String attrOpen  = m.group( 2 );
         final String value     = m.group( 4 );
         final String content   = m.group( 5 );
         final String attrClose = m.group( 6 );
         assert attrClose.equals( attrOpen );
         System.err.printf(
            "pre = '%s', attr = '%s', value = '%s', content = '%s'\n",
            preText, attrOpen, value, content );
         System.err.println("-----------------------------");
      }
   }
}

执行日志:

pre = 'hi ', attr = 'user', value = '1234', content = 'John Jack'
-----------------------------
pre = ' take me home. ', attr = 'user', value = '12', content = ' Jonno Ha '
-----------------------------
pre = ' where you are ', attr = 'differentTag', value = 'null', content = ' hm? '
-----------------------------
pre = '. Peter Im here with ', attr = 'user', value = '1', content = 'Danny Di '
-----------------------------

答案 4 :(得分:0)

  

我假设您不需要任何代码,否则请评论我删除答案

要排除您可以使用的[user=1234][/user]

[^\]\[a-zA-Z=\d\/]

并匹配其他部分:

[a-zA-Z ]*[^\]\[a-zA-Z=\d\/][a-zA-Z]*

并输入:

hi [user=1234]John Jack[/user] take me home. [user=12] Jonno Ha [\user] where you are [differentTag] hm? [/differentTag]. Peter Im here with [user=1]Danny Di [\user]

你可以使用:

[a-zA-Z ]*[^\]\[a-zA-Z=\d\/\\]+[a-zA-Z ]* 

它匹配除[]

内的所有内容之外的所有内容

[a-zA-Z ]*[^\]\[a-zA-Z=\d\/\\]+[a-zA-Z ]*

不包括:

[user=1234]
[/user]
[user=12]
[\user]
[differentTag]
[/differentTag]
[user=1]
[\user]

如果您想在[/user][\user]之前仅匹配用户名,请尝试:

[a-zA-Z ]+(?=\[(?:\\|\/)user\]) 

[a-zA-Z ]+(?=\[(?:\\|\/)user\])

匹配:

John Jack  
Jonno Ha 
Danny Di 

比以上更有效率:

(?<=])[a-zA-Z ]+(?=\[(?:\\|\/))  

仍然匹配:

John Jack  
Jonno Ha 
Danny Di