在文本中给定位置之前和之后返回指定数量的单词

时间:2014-09-30 20:20:54

标签: java regex words

我对以下代码有一个大问题。我希望它能在找到的关键字(针)之前和之后返回n个单词,但它永远不会。

如果我有文字,请说

"There is a lot of interesting stuff going on, when someone tries to find the needle in the haystack. Especially if there is anything to see blah blah blah". 

我有这个正则表达式:

"((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}\b)needle(\b(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5})"

这不应该与给定字符串中的针完全匹配,并将文本作为

返回
someone tries to find the needle in the haystack. Especially if

它永远不会:-(执行时,我的方法总是返回一个空字符串,虽然我绝对知道,关键字在给定的文本中。

private String trimStringAtWordBoundary(String haystack, int wordsBefore, int wordsAfter, String needle) {
    if(haystack == null || haystack.trim().isEmpty()){
        return haystack ;
    }

    String textsegments = "";

    String patternString = "((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,"+wordsBefore+"}\b)" + needle + "(\b(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,"+wordsAfter+"})";


    Pattern pattern = Pattern.compile(patternString);
    Matcher matcher = pattern.matcher(haystack);

    logger.trace(">>> using regular expression: " + matcher.toString());

    while(matcher.find()){
        logger.trace(">>> found you between " + matcher.regionStart() + " and " + matcher.regionEnd());
        String segText = matcher.group(0); // as well tried it with group(1)
        textsegments += segText + "...";
    }

    return textsegments;
}

很明显,问题在于我的正则表达式,但我无法弄清楚它有什么问题。

1 个答案:

答案 0 :(得分:3)

你的正则表达式基本上没问题,但在Java中你需要转义\b

"((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}\\b)needle(\\b(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5})"