我有this regex,如果链接后没有空格,它会在同一行捕获内容。
正则表达式是:
(?:http\:\/\/)?(?:www\.)?ama?zo?n\.(?:com|ca|co\.uk|co\.jp|de|fr)/(?:exec/obidos/ASIN/|o/|gp/product/|(?:(?:[^"\'/]*)/)?dp/|)(B[A-Z0-9]{9})(?:(?:/|\?|\#)(?:[^"\'\s]*))?
我的预期输入是
[link](http://www.amazon.com/dp/B00CTUER1M)
Here is[a cool toy](http://www.amazon.com/dp/B00CTUER1M/ref=gb1h_img_e-4_8722_fb086345?smid=ATVPDKIKX0DER)!dddd fdsfsdfds
我希望输出为
[link](http://www.amazon.com/dp/B00CTUER1M?tag=affcode-20)
Here is[a cool toy](http://www.amazon.com/dp/B00CTUER1M?tag=affcode-20)!dddd fdsfsdfds
然而,对于第二个我得到了
Here is[a cool toy](http://amazon.com/dp/B00CTUER1M/?tag=affcode-20 fdsfsdfds
答案 0 :(得分:2)
从它的外观来看,你从最后的负面角色类中遗漏了一个紧密的括号)
。
# (?:http://)?(?:www\.)?ama?zo?n\.(?:com|ca|co\.uk|co\.jp|de|fr)/(?:exec/obidos/ASIN/|o/|gp/product/|(?:(?:[^"'/]*)/)?dp/|)(B[A-Z0-9]{9})(?:(?:/|\?|\#)(?:[^"'\s)]*))?
(?: http:// )?
(?: www \. )?
ama? zo?n \.
(?:
com
| ca
| co \. uk
| co \. jp
| de
| fr
)
/
(?:
exec/obidos/ASIN/
| o/
| gp/product/
| (?:
(?: [^"'/]* )
/
)?
dp/
|
)
( B [A-Z0-9]{9} ) # (1)
(?:
(?: / | \? | \# )
(?: [^"'\s)]* ) # <- Add ')' to negative class
)?