我正在尝试在java中编写一个匹配单词和带连字符的单词的正则表达式。到目前为止,我有:
Pattern p1 = Pattern.compile("\\w+(?:-\\w+)",Pattern.CASE_INSENSITIVE);
Pattern p2 = Pattern.compile("[a-zA-Z0-9]+",Pattern.CASE_INSENSITIVE);
Pattern p3 = Pattern.compile("(?<=\\s)[\\w]+-$",Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
这是我的测试用例:
Programs Dsfasdf. Programs Programs Dsfasdf. Dsfasdf. as is wow woah! woah. woah? okay. he said, "hi." aasdfa. wsdfalsdjf. go-to go- to asdfasdf.. , : ; " ' ( ) ? ! - / \ @ # $ % & ^ ~ ` * [ ] { } + _ 123
任何帮助都很棒
我的预期结果是匹配所有单词,即
Programs Dsfasdf Programs Programs Dsfasdf Dsfasdf as is wow woah woah woah okay he said hi aasdfa wsdfalsdjf go-to go-to asdfasdf
我正在努力解决的问题是将在行之间拆分的单词作为一个单词进行匹配。
即
go- to
答案 0 :(得分:3)
\p{L}+(?:-\n?\p{L}+)* \ /^\ /^\ /\ /^^^ \ / | | | | \ / ||| | | | | | | ||`- Previous can repeat 0 or more times (group of literal '-', optional new-line and one or more of any letter (upper/lower case)) | | | | | | |`-- End first non-capture group | | | | | | `--- Match one or more of previous (any letter, upper/lower case) | | | | | `------ Match any letter (upper/lower case) | | | | `---------- Match a single new-line (optional because of `?`) | | | `------------ Literal '-' | | `-------------- Start first non-capture group | `---------------- Match one or more of previous (any letter between A-Z (upper/lower case)) `------------------- Match any letter (upper/lower case)
答案 1 :(得分:1)
我会选择正则表达式:
\p{L}+(?:\-\p{L}+)*
此类正则表达式还应匹配单词“fiancé”,“À-la-carte”以及包含某些特殊类别“字母”字符的其他单词。 \p{L}
匹配“letter”类别中的单个代码点。