在正则表达式捕获组中,排除一个词

时间:2019-06-14 07:34:13

标签: regex regex-negation

我有这种类型的网址:

https://example.com/en/app/893245
https://example.com/ru/app/wq23245
https://example.com/app/8984245

我只想提取comapp之间的单词

https://example.com/en/app/893245 -> en
https://example.com/ru/app/wq23245 -> ru
https://example.com/app/8984245 ->

我试图将应用从捕获组中排除,但是除了这样,我不知道该怎么做:

.*com\/((?!app).*)\/app

是否可能会发生类似的事情,但无法捕获app一词? example\.com\/(\w+|?!app)\/

公共链接:https://rubular.com/r/NnojSgQK7EuelE

2 个答案:

答案 0 :(得分:2)

如果您需要纯正则表达式,则可以使用lookarounds

/(?<=example\.com\/)\w+(?=\/app)/

或者,在URL上下文中可能更好:

/(?<=example\.com\/)[^\/]+(?=\/app)/

请参见Rubular demo

In Ruby,您可以使用

strs = ['https://example.com/en/app/893245','https://example.com/ru/app/wq23245','https://example.com/app/8984245']
strs.each { |s|
    p s[/example\.com\/(\w+)\/app/, 1]
}
# => ["en", "ru", nil]

答案 1 :(得分:0)

您可以使用sed

sed -n -f script.sed yourinput.txt

并在script.sed内部:

s/.*com\/\(.*\)\/app.*/\1/p

示例输入:

https://example.com/en/app/893245
https://example.com/ru/app/wq23245
https://example.com/app/8984245

示例输出:

$ sed -n -f comapp.sed comapp.txt
en
ru