Java Regex无法使用多行

时间:2015-11-17 15:46:59

标签: java regex

昨晚我得到了一些帮助,找出一个正则表达式来捕捉最小的一组。我需要一串歌词并在其中找到一个搜索短语。我遇到的问题是我不能让它看起来多线。

我有一个带有我读过的歌词的文本文件,这只是歌曲的一部分。 (括号不在文本文件中我只是用它们来显示我想要捕获的组。

 The first [time we fall in love. 
 Love can be exciting, it can be a bloody bore. 
 Love can be a pleasure or nothing but a chore.
 Love can be like a dull routine, 
 It can run you around until you're out of steam. 
 It can treat you well, it can treat you mean, 
 Love can mess you around, 
 Love can pick you up, it can bring you down]. 
 But they'll never know The feelings we show 

我正在使用正则表达式的短语是

 time can bring you down

我使用stringbuilder创建歌词字符串,然后歌词包含\ n字符。我尝试使用replaceAll剥离它们,但它仍然无法正常工作。如果我进入文本文件,只写一行说时间可以让你失望,它可以工作,但如果我把它写成两行,它就不会。

我尝试在我的正则表达式中使用\ n但它最终捕获了大部分歌曲,因为时间是第二个单词。这是我目前正在尝试使用的正则表达式:

(?is)(\bTime\b)(?:(?!\n\b(?:time|can|bring|you|down)\b\n).)*(\bcan\b)(?:(?!\b(?:time|can|bring|you|down)\b).)*(\bbring\b)(?:(?!\b(?:time|can|bring|you|down)\b).)*(\byou\b)(?:(?!\b(?:time|can|bring|you|down)\b).)*(\bdown\b)

我正在尝试捕捉歌词中上方括号中的内容。这是我使用它的方法,它接受歌词和searchPhrase并返回它找到的字符串的长度。

    static int rankPhrase(String lyrics, String lyricsPhrase){
    //This takes in song lyrics and the phrase we are searching for

    //Split the phrase up into separate words
    String[] phrase = lyricsPhrase.split("[^a-zA-Z]+");

    //Helper string for regex so we can get smallest grouping
    String regexHelper = lyricsPhrase.replaceAll(" ","|").toLowerCase();

    //Start to build the regex
    StringBuilder regex = new StringBuilder("(?im)"+"(\\" + "b" + phrase[0] + "\\b)");

    //loop through each word in the phrase
    for(int i = 1; i < phrase.length; i++){ 

            //add this to the regex we will search for
            regex.append("(?:(?!\\b(?:" + regexHelper + ")\\b).)*(\\b" + phrase[i] + "\\b)");   

    }

    //Create the pattern
    Pattern p = Pattern.compile(regex.toString(), Pattern.DOTALL);
    Matcher m = p.matcher(lyrics);

    //string for regex match found
    String regexMatch = "";
        while(m.find()){

            regexMatch = m.group();
            System.out.println(regexMatch);
    }

    return regexMatch.length();

}

我将继续尝试解决这个问题,我认为这是在正则表达式中工作但不是100%肯定的问题。谢谢!

1 个答案:

答案 0 :(得分:0)

您正在尝试搜索字符串中的单词组合。使用word1.*?word2作为正则表达式可以轻松实现这一点。这里第一字和第二字之间可以有n个字符。 ?表示延迟匹配。尽量少。
但问题是你试图在多行中搜索一个模式。当您使用.元字符时,它可以在一行中使用。 .是除新换行符之外的所有元字符。
您可以使用(.|\n)*而不是使用.*轻松解决此问题

我已在下面更新了您的代码。

public class Regexa2 {
 static int rankPhrase(String lyrics, String lyricsPhrase){
    //This takes in song lyrics and the phrase we are searching for

    //Start to build the regex
    String regex = lyricsPhrase.replaceAll(" ","(.|\\n)*?").toLowerCase();

    System.out.println(regex);
    //Create the pattern
    Pattern p = Pattern.compile(regex.toString(), Pattern.DOTALL);
    Matcher m = p.matcher(lyrics);

    //string for regex match found
    String regexMatch = "";
        while(m.find()){

            regexMatch = m.group();
            System.out.println(regexMatch);
    }

    return regexMatch.length();

}

public static void main(String[] args) {
    String lyrics = "The first time we fall in love. \n" + 
            "Love can be exciting, it can be a bloody bore. \n" + 
            "Love can be a pleasure or nothing but a chore.\n" + 
            "Love can be like a dull routine, \n" + 
            "It can run you around until you're out of steam. \n" + 
            "It can treat you well, it can treat you mean, \n" + 
            "Love can mess you around, \n" + 
            "Love can pick you up, it can bring you down. \n" + 
            "But they'll never know The feelings we show ";
    String phrase = "time can bring you down";
    Regexa2.rankPhrase(lyrics, phrase);
 }
}