Question

目前我们正在使用javascript new RegExp('#[^,#=!\s][^,#=!\s]*')（参见[1]）它主要起作用，除了它还匹配像http://this.is/no#hashtag这样的锚点的URL，我们宁愿避免匹配foo #bar

已经进行了一些尝试，但它似乎没有用，或者我只是没有得到它。

使用以下源文本：

#public #writable #kommentarer-till-beta -- all these should be matched
Verkligen #bra jobbat! T ex #kommentarer till #artiklar och #blogginlägg, kool. -- mixed within text
http://this.is/no#hashtag -- problem
xxy#bar      -- We'd prefer not matching this one, and...
#foo=bar   =foo#bar  -- we probably shouldn't match any of those either.
#foo,bar #foo;bar #foo-bar #foo:bar   -- We're flexible on whether these get matched in part or in full

我们希望得到以下输出：

（出于可读性原因，显示$而不是＆lt; a class = tag href = .....＆gt; ...＆lt; / a＆gt;）

$ $ $ -- all these should be matched
Verkligen $ jobbat! T ex $ till $ och $, kool. -- mixed within text
http://this.is/no$ -- problem
xxy$      -- We'd prefer not matching this one, and...
$=bar   =foo$  -- we probably shouldn't match any of those either.
$,bar $ $ $   -- We're flexible on whether these get matched in part or in full

[1] http://github.com/ether/pad/blob/master/etherpad/src/plugins/twitterStyleTags/hooks.js

Answer 1

我相信寻找单词边界会在这里诀窍（或者，显然缺乏 - 这对我来说似乎有点违反直觉）。

\B#[^,#=!\s]+与第三行或第四行上的任何内容都不匹配。但是，它与＃foo = bar中的#foo匹配，以及示例中$ sign所涵盖的所有内容。

编辑：稍微摆弄后，\B#[^,#=!\s]+[\s,]将匹配第一行和第二行的所有内容。在第3-5行没有匹配，在第6行，除了＃foo，bar之外的所有内容都是完全匹配的（＃foo，bar在逗号之前的部分只有匹配。

您可能希望捕获组最后省略空格或逗号，以便\B(#[^,#=!\s]+)[\s,]。

（如果你真的想要第6行的所有标签完全匹配，请从第一个字符类中删除逗号。）

请注意，您可能需要更多内容才能获得完美的覆盖率，但这至少可以满足您当前的测试用例。

javascript RegEx标签匹配#foo和＃foo-fåäö但不是http://this.is/no#hashtag

1 个答案: