正则表达式用于捕获行之间的带连字符的单词和非带连字符的单词

时间:2012-06-27 21:53:56

标签: java regex match capture

我正在尝试在java中编写一个匹配单词和带连字符的单词的正则表达式。到目前为止,我有:

Pattern p1 = Pattern.compile("\\w+(?:-\\w+)",Pattern.CASE_INSENSITIVE);
Pattern p2 = Pattern.compile("[a-zA-Z0-9]+",Pattern.CASE_INSENSITIVE);
Pattern p3 = Pattern.compile("(?<=\\s)[\\w]+-$",Pattern.CASE_INSENSITIVE | Pattern.DOTALL);

这是我的测试用例:

    Programs
    Dsfasdf. Programs Programs Dsfasdf. Dsfasdf. as is wow woah! woah. woah? okay. 
    he said, "hi." aasdfa. wsdfalsdjf. go-to go-
to
asdfasdf.. , : ; " ' ( ) ? ! - / \ @ # $ % & ^ ~ `  * [ ] { } + _ 123

任何帮助都很棒

我的预期结果是匹配所有单词,即

Programs Dsfasdf Programs Programs Dsfasdf Dsfasdf
as is wow woah woah woah okay he said hi aasdfa
wsdfalsdjf go-to go-to asdfasdf 

我正在努力解决的问题是将在行之间拆分的单词作为一个单词进行匹配。

go-
to

2 个答案:

答案 0 :(得分:3)

\p{L}+(?:-\n?\p{L}+)*
\   /^\ /^\ /\   /^^^
 \ / | | | |  \ / |||
  |  | | | |   |  ||`- Previous can repeat 0 or more times (group of literal '-', optional new-line and one or more of any letter (upper/lower case))
  |  | | | |   |  |`-- End first non-capture group
  |  | | | |   |  `--- Match one or more of previous (any letter, upper/lower case)
  |  | | | |   `------ Match any letter (upper/lower case)
  |  | | | `---------- Match a single new-line (optional because of `?`)
  |  | | `------------ Literal '-'
  |  | `-------------- Start first non-capture group
  |  `---------------- Match one or more of previous (any letter between A-Z (upper/lower case))
  `------------------- Match any letter (upper/lower case)

Is this OK?

答案 1 :(得分:1)

我会选择正则表达式:

\p{L}+(?:\-\p{L}+)*

此类正则表达式还应匹配单词“fianc锓À-la-carte”以及包含某些特殊类别“字母”字符的其他单词。 \p{L}匹配“letter”类别中的单个代码点。