Question

我正在尝试找到一种方法来确定一行是否包含特定字符串，而同时如果它出现在某些单词中则不匹配。我有部分工作，但如果其中一个排除字以关键字开头，则会失败。

将成功排除列出的所有单词，但tomcat＆amp;明天。我假设这是因为我匹配关键字，所以前瞻失败，但我不知道如何解决它。

Answer 1

更新：遗憾的是，除非您在非捕获组中.的两侧放置否定前瞻，否则我无法解决这个问题：

^(?:(?!custom|onetomany|manytomany|atom|tomcat|tomorrow|automatic).(?!custom|onetomany|manytomany|atom|tomcat|tomorrow|automatic))*?(tom).*

Demo

如果您在否定前瞻之前移动.，它就会有效：.(?!...)

我也会制作* repetition lazy，所以它不需要回溯那么多（并非总是如此，但在本例中）。此外，如果您想匹配整行并且仅捕获tom的实例，请将该组包含.(?!...) non-capturing并使用贪婪的.*完成表达式：< / p>

^(?:.(?!custom|onetomany|manytomany|atom|tomcat|tomorrow|automatic))*?(tom).*

Demo

Answer 2

这种情况直接来自Match (or replace) a pattern except in situations s1, s2, s3 etc。

与其他潜在的解决方案相比，正则表达式并不简单：

custom|onetomany|manytomany|atom|tomcat|tomorrow|automatic|(tom)

如果您不仅要显示tom，而是显示它所在的整个单词，例如tomahawk，请将其更改为：

custom|onetomany|manytomany|atom|tomcat|tomorrow|automatic|(\w*tom\w*)

交替的左侧与您不想要的单词相匹配。我们将忽略这些匹配。右侧匹配并将tom捕获到第1组，我们知道它们是正确的tom，因为它们与左侧的表达式不匹配。

此程序显示了如何使用正则表达式（请参阅online demo底部的结果）。它会找到tom和tomahawk。

import java.util.*;
import java.io.*;
import java.util.regex.*;
import java.util.List;

class Program {
public static void main (String[] args) throws java.lang.Exception  {

String subject = "custom onetomany manytomany atom tomcat tomorrow automatic tom tomahawk";
Pattern regex = Pattern.compile("custom|onetomany|manytomany|atom|tomcat|tomorrow|automatic|(\\w*tom\\w*)");
Matcher regexMatcher = regex.matcher(subject);
List<String> group1Caps = new ArrayList<String>();

// put Group 1 captures in a list
while (regexMatcher.find()) {
if(regexMatcher.group(1) != null) {
group1Caps.add(regexMatcher.group(1));
}
} // end of building the list

System.out.println("\n" + "*** Matches ***");
if(group1Caps.size()>0) {
for (String match : group1Caps) System.out.println(match);
}

} // end main
} // end Program

参考

How to match (or replace) a pattern except in situations s1, s2, s3...

Answer 3

我认为这就是你所追求的：

\b(?!(?:custom|onetomany|manytomany|atom|tomcat|tomorrow|automatic)\b)[a-z]*tom[a-z]*\b

我使用了单词边界（\b）而不是锚点（^），所以它会在任何地方找到单词，而不仅仅是在开头。在末尾添加另一个\b可确保它只匹配完整的单词。

前瞻子表达式末尾的\b对过滤后的单词执行相同操作。例如，它不匹配automatic，但将匹配automatically。

前瞻通过后，[a-z]*tom[a-z]*\b会匹配包含tom的单词（或更准确地说，是连续的字母序列）。我做了很多简化假设，所以我可以专注于技术。最重要的是，如果你的话语是＆＃34;可以包含非字词字符，如连字符（-）或撇号（'），[a-z]*和\b可能不够好。

否定前瞻匹配字符串，除非它出现在特定的单词中。

3 个答案: