正则表达式提取字段

时间:2017-08-23 21:23:10

标签: regex extraction

我想提取URL的某些部分并保存在捕获组L_URL中,例如从http:// al开始,直到下面的示例事件中的m3u8,问题是两个事件的格式不同

示例event1

23 Aug 2017 19:04:38 [WARN ] http_srv: DONE 1259531 0.006744 404[Not Found] UNKNOWN-ID 69.132.62.224:22836 GET http://mmdai-linear-west-03.abc.com/linear-scope010.le.com/LIVE/1008/hls/ae/Nat_HD/.swn93bf20ea-8fb5-4493-9f5e-005056b23b1dapple2apple/.rate_3764512/index_v_3764512_7.m3u8?nw=376521&prof=376521:twc_hls_live&mode=live&vdur=600&caid=NGC_LIVE&csid=stva_android_ph_live&vcid=c4424608-d8cc-3f28-a171-c496b32e99a3&z5=28213&ads=VAST_LIVE&tagset_name=VAST&_fw_lpu=http://linear-scope010.le.com/LIVE/1008/hl... (id 5782790)

SAMPLE2:

23 Aug 2017 19:04:38 [WARN ] http_srv: Total latency exceeded threshold: 0.053182 seconds (internal 0.053000 s) origin 0.000000 seconds MCHit 0 Status: 404 IP: 66.87.73.51:12866 URL: http://mmdai-linear-west-03.timewarnercable.com/speclive.video.cdn.charter.com/LIVE/1028/hls/ae/FX2_HD/.swne0c89ee6-59b0-4e1b-bf9c-005056b23b1dapple2apple/.rate_433152/index_v_433152_1.m3u8?nw=376521&prof=376521:twc_hls_live&mode=live&vdur=600&caid=FX_LIVE&csid=stva_android_ph_live&vcid=3f1749c5-7a19-40b2-8d3c-d7264e89a366&z5=62269&ads=VAST_LIVE_LCHTR&tagset_nam... (id 65087410)

谢谢

1 个答案:

答案 0 :(得分:1)

  

' HTTP://'直到' m3u8'的方式,保存在捕获组L_URL

捕获它的模式:

(?<L_URL>https?:(?:(?!m3u8).*)m3u8)

(?<L_URL>  // capture within group, named 'L_URL'
https?:    // exactly 'http', then optionally 's'
(?:        // grouped but not captured
(?!m3u8)   // look ahead and don't match exactly 'm3u8'
.*         // match anything 0 or more times
)          // end of group
m3u8       // exactly 'm3u8'
)          // end of group

Demo (PCRE)