使用正则表达式返回单词位置

时间:2015-12-11 04:17:28

标签: java regex

我在java中使用正则表达式和匹配器方法返回单词位置时遇到了麻烦。

假设我有一句话“快速的棕色狐狸跳过世界上最懒的狗”而在我现在的正则表达式中我想要返回一个特定单词的位置。

假设输入是“棕色”,从上面的例子中,它应该返回3,这是句子中的第3个单词。如果它是“快速”,它应返回2,句子中的第二个单词。如果它是“世界”那么应该返回12.我希望我已经给出了足够的例子。

我的尝试是

Pattern p= Pattern.compile("(?i)(?<=^|[^A-Z0-9a-z])enemy(?=$|[^A-Z0-9a-z])");
        Matcher m = p.matcher("The quickman is an enemy from megaman.");
       if(m.find()){
            System.out.println(m.start());
            System.out.println(m.end());
            System.out.println(m.group());
        }

但是matcher.start()只返回16的字符串索引,而不是单词的位置。任何提示或帮助将不胜感激。

1 个答案:

答案 0 :(得分:2)

以下是单词brown的示例:

\b(?:(brown)|(\S+))\b

Regular expression visualization

// \b(?:(brown)|(\S+))\b
// 
// Options: Case sensitive; Exact spacing; Dot doesn’t match line breaks; ^$ don’t match at line breaks; Default line breaks
// 
// Assert position at a word boundary (position preceded or followed—but not both—by a Unicode letter, digit, or underscore) «\b»
// Match the regular expression below «(?:(brown)|(\S+))»
//    Match this alternative (attempting the next alternative only if this one fails) «(brown)»
//       Match the regex below and capture its match into backreference number 1 «(brown)»
//          Match the character string “brown” literally (case sensitive) «brown»
//    Or match this alternative (the entire group fails if this one fails to match) «(\S+)»
//       Match the regex below and capture its match into backreference number 2 «(\S+)»
//          Match a single character that is NOT a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\S+»
//             Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Assert position at a word boundary (position preceded or followed—but not both—by a Unicode letter, digit, or underscore) «\b»

找到褐色的示例程序:

import java.lang.Math;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.regex.PatternSyntaxException;


public class HelloWorld
{
  public static void main(String[] args)
  {
    Integer counter = new Integer(0);
    String subjectString = "The quick brown fox jumps over the laziest dog in the world";
    String testWordString = "brown";
    try {
      Pattern regex = Pattern.compile("\\b(?:(brown)|(\\S+))\\b");
      Matcher regexMatcher = regex.matcher(subjectString);
      while (regexMatcher.find()) {
        // here increment a count for each word we pass.
        counter++;

        // matched text: regexMatcher.group()
        // match start: regexMatcher.start()
        // match end: regexMatcher.end()

        System.out.println(regexMatcher.group());

        // if the word text `regexMatcher.group()` matches our subject word `brown` exit the loop.
        if (testWordString.equals(regexMatcher.group())) {
          System.out.println("found the word: " + counter);
          break;
        }

      } 
    } catch (PatternSyntaxException ex) {
      // Syntax error in the regular expression
    }
  }
}

输出:

The
quick
brown
found the word: 3

注意可以简化示例以从brown删除\b(?:(brown)|(\S+))\b的显式测试:

\b(\S+)\b

为:

brown

但我的思维过程是允许您使用不同的正则表达式捕获组来指示您是否找到了匹配,而不是每次都使用字符串比较app/design/adminhtml/default/default/template/sales/order/view/items/renderer/default.phtml

我会把它作为锻炼给你。