Question

对不起伙计们，我用谷歌搜索，仍然无法让我的代码工作。不完全是java的高手（但是，给我时间:-)）。我有一个xml文档，我使用DOM解析器来读取，提取类属性，现在我需要使用正则表达式排除其中一些属性。例如，到目前为止我的输出是：

[[#text: ns1:Spare3]]

[[#text: ns1:Spare4]]

[[#text: ns1:Spare5]]

[[#text: ns1:Street]]

[[#text: ns1:Anything]]

[[#text: ns1:TearLineDateUpdated]]

[[#text: ns1:SourceReportTearline]]

[[#text: ns1:AnyFilter]]

[[#text: ns1:UpdatedByTelecom]]

[[#text: ns1:UpdatedByName]]

我需要排除那些包含单词Spare的行，或者以TearLine（不区分大小写）和其他一些行开头。

我的代码片段（我写的用于测试）说：

Pattern p = Pattern.compile(".*?\\Spare\\(.*?\\)",    
Pattern.CASE_INSENSITIVE|Pattern.DOTALL | Pattern.MULTILINE);
Matcher m = p.matcher((nl.item(i)).toString());
if (m.matches())
{
System.out.println("["+nl.item(i)+"]" + "matched"); 
}
else
{
System.out.println("["+nl.item(i)+"]" + "not matched");     
}

如何排除包含单词Spare的任何行以及以TearLine开头的所有行（但TearLine可以出现在单词的其他位置，那就好了。）

Answer 1

不要介意使用所有这些代码行 - 只需使用基于String.matches()的简单单行代码：

if (nl.item(i).toString().matches("(?i)(?s).*ns1:(spare|tearline).*")) {
    // not matched
else {
    // matched
}

fyi (?i)使正则表达式不区分大小写，(?s)是Pattern.DOTALL的正则表达式。

Answer 2

那些是你想要匹配的实际字符串吗？也就是说，DOM解析器生成了这些字符串，现在你正在应用正则表达式吗？如果是这样，你想要这样的东西：

Pattern p = Pattern.compile(
    "ns1:(tearline|.*spare)", Pattern.CASE_INSENSITIVE
);
Matcher m = p.matcher("");

String[] inputs = {
    "[[#text: ns1:Spare3]]",
    "[[#text: ns1:Spare4]]",
    "[[#text: ns1:Spare5]]",
    "[[#text: ns1:Street]]",
    "[[#text: ns1:Anything]]",
    "[[#text: ns1:TearLineDateUpdated]]",
    "[[#text: ns1:SourceReportTearline]]",
    "[[#text: ns1:AnyFilter]]",
    "[[#text: ns1:UpdatedByTelecom]]",
    "[[#text: ns1:UpdatedByName]]"
};

for (String s : inputs)
{
  System.out.printf( "%n%5b => %s%n", !m.reset(s).find(), s );
}

输出：

false => [[#text: ns1:Spare3]]

false => [[#text: ns1:Spare4]]

false => [[#text: ns1:Spare5]]

 true => [[#text: ns1:Street]]

 true => [[#text: ns1:Anything]]

false => [[#text: ns1:TearLineDateUpdated]]

 true => [[#text: ns1:SourceReportTearline]]

 true => [[#text: ns1:AnyFilter]]

 true => [[#text: ns1:UpdatedByTelecom]]

 true => [[#text: ns1:UpdatedByName]]

注意：

我使用find()代替matches()，所以我的正则表达式只需匹配我感兴趣的部分，而不是整个字符串。
其他一些响应者使用^TearLine，因为你说这个词必须出现在行的开头，但如果我的猜测正确，你真的想在{{{{{{ 1}}前缀。另一方面，ns1:允许.*spare出现在任何地方，而不仅仅是在开头（spare也有效）。
同样，Ωmega使用.*?spare假设您只对完整字"\\bSpare\\b"感兴趣。我遗漏了单词边界（Spare），因为您似乎想要匹配\b或（我猜）Spare3等内容。
我不知道您为什么在正则表达式中添加FooSpare，因为示例字符串中没有括号。

Answer 3

使用正则表达式

^(?:TearLine.*|.*\\bSpare\\b)

Answer 4

你可能想要摆脱第一个反斜杠

".*?Spare\\(.*?\\)"

因为\S匹配任何不是空白的内容。

另一方面，你的正则表达式需要看起来像：

"ns1:tearline.*"

Answer 5

匹配以TearLine开头的行：

^TearLine

匹配包含Spare的行：

Spare

将这些组合在一个表达式中：

(?:^TearLine)|(?:Spare)

使用带有多个排除字段的java regex进行排除

5 个答案: