我必须将一行文字分成单词,并对使用正则表达式感到困惑。 我到处寻找一个匹配单词的正则表达式,并找到类似于这篇文章的正则表达式,但是想要它在java中(java在正则字符串中不处理\。)
Regex to match words and those with an apostrophe
我已经尝试了每个答案的正则表达式,并且不确定如何为此构建java的正则表达式(我假设所有正则表达式都是相同的)。如果我在正则表达式中替换\ by \,那么正则表达式不起作用。
我也试过自己查找并来到这个页面: http://www.regular-expressions.info/reference.html
但我无法理解正则表达式的先进技术。
我正在使用String.split(此处使用正则表达式字符串)来分隔我的字符串。 一个例子是,如果我给出以下内容: “我喜欢吃,但我不喜欢吃每个人的食物,否则他们会饿死。” 我想要匹配:
I
like
to
eat
but
I
don't
like
to
eat
everyone's
food
or
they'll
starve
我也不想匹配''或''''或''或'。''或其他排列。 我的分隔符条件应类似于: [匹配任何单词字符] [如果前面有单词字符,则匹配撇号,然后匹配后面的单词字符,如果有的话)
我得到的只是一个匹配单词[\ w]的简单正则表达式,但我不确定如何使用前瞻或后面的内容来匹配撇号和剩下的单词。
答案 0 :(得分:3)
使用评论中所述页面上WhirlWind
的答案,您可以执行以下操作:
String candidate = "I \n"+
"like \n"+
"to "+
"eat "+
"but "+
"I "+
"don't "+
"like "+
"to "+
"eat "+
"everyone's "+
"food "+
"'' '''' '.' ' "+
"or "+
"they'll "+
"starv'e'";
String regex = "('\\w+)|(\\w+'\\w+)|(\\w+')|(\\w+)";
Matcher matcher = Pattern.compile(regex).matcher(candidate);
while (matcher.find()) {
System.out.println("> matched: `" + matcher.group() + "`");
}
它将打印:
> matched: `I`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `but`
> matched: `I`
> matched: `don't`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `everyone's`
> matched: `food`
> matched: `or`
> matched: `they'll`
> matched: `starv'e`
您可以在此处找到一个正在运行的示例:http://ideone.com/pVOmSK
答案 1 :(得分:0)
以下正则表达式似乎正确覆盖了您的示例字符串。但它并没有涵盖撇号的情景。
[\s,.?!"]+
Java代码:
String input = "I like to eat but I don't like to eat everyone's food, or they'll starve.";
String[] inputWords = input.split("[\\s,.?!]+");
如果我理解正确,撇号应该保持不变,只要它在一个单词字符之后。下一个正则表达式应该涵盖上述加上撇号的特殊情况。
(?<!\w)'|[\s,.?"!][\s,.?"'!]*
Java代码:
String input = "I like to eat but I don't like to eat everyone's food, or they'll starve.";
String[] inputWords = input.split("(?<!\\w)'|[\\s,.?\"!][\\s,.?\"'!]*");
如果我在字符串上运行第二个正则表达式:Hey there! Don't eat 'the mystery meat'.
我在字符串数组中得到以下单词:
Hey
there
Don't
eat
the
mystery
meat'