Question

我使用正则表达式将数据分组。这些线看起来有点像

    testword test 
    test testword
    tes.w. tes.
    tes tes.w.
    tes.w othertexttobefound
    sometexttobefound testword somemoretextwhichdoesnotmatter

找到test一词以及othertexttobefound和sometexttobefound。

现在我试图告诉我的解析器在搜索时应该明确地忽略testword及其衍生物，并专注于其余的数据条目。 “好词”和“坏词”可以在每一行的任何地方。

我已经尝试[^w]这对于字符串的开头很好，但在我的版本中不适用于其他情况。 (?:w)也没有做到这一点。我无法使用外观，因为这样可以防止检测到整条线。

在互联网上长时间搜索后，我希望能在这里寻求帮助！

提前谢谢！

Gerit

在非常感谢Naxos84的帮助之后，我正在添加一些德国现实生活中的例子：

sozialabgabe sozialarbeiter
soz.abg. sozialarbeiter
sozarbeiter soz.abg.
sozialarbeiter otherirrelevantstuff
otherirrelevantstuff soz abg
otherirrelevantstuff sozabg
otherirrelevantstuff sozialabgabe

如果我用

搜索

sozial["^\ab"]|soz["^\ab"]|sometexttobefound|othertexttobefound

第6行和第7行也被标记，但我不想要那些。

我做错了什么？感谢您提供进一步的指示。

链接： regexr

Answer 1

找到你想要的所有比赛：任何出现的＆＃34;测试＆＃34;和＆＃34;某些文字发现＆＃34;和＆＃34; othertexttobeound你可以尝试以下正则表达式：

test[^\w]|sometexttobefound|othertexttobefound

这个正则表达式意味着：
找到所有＆＃34;测试＆＃34;那是不后跟一个单词或sometexttobefound ORothertexttobefound

我用以下文字尝试了这个正则表达式（我添加了几个＆＃39; test＆＃39; s）

testword test 
test testword
tes.w. testtes.
tes tes.w. test
tes.w othertexttobefound
sometexttobefound testword somemoretextwhichdoesnotmatter

at regexr（使用全局标志时）

如果您还想找到像＆＃34; tes＆＃34;我想你应该添加它。（我不是正则表达式专家）像：

test[^\w]|tes[^\w]|sometexttobefound|othertexttobefound

正则表达式 - 跳过表达式并解析其余的

1 个答案: