Question

我正在使用ruby来读取文件，我需要以某种方式解析每行中的一些数据并将其存储在数组中。文件中的两个示例“行”是：

64.34.145.197 - - [03/Sep/2006:05:31:37 -0400] "GET /robots.txt HTTP/1.0" 200 56
64.34.145.197 - - [03/Sep/2006:05:31:37 -0400] "GET /manual/mod/mod_autoindex.html HTTP/1.0" 200 39134

由此我需要获得/robots.txt和/manual/mod/mod_autoindex.html。使用以下简单的RegEx，我已经能够提取GET /robots.txt和GET /manual/mod/mod_autoindex.html，但我似乎无法动摇GET。

arr.push(/GET \S+/.match(line))

我已经尝试了一些预测但我几乎是一个RegEx n00b。非常感谢任何帮助。

Answer 1

这应该做：

arr.push(/(?<=GET )\S+/.match(line))

如果HTTP保证跟随网址，您也可以这样做以进一步“框架”匹配：

arr.push(/(?<=GET )\S+(?= HTTP)/.match(line))

(?<=...)和(?=...)被称为positive lookarounds，顺便说一句。

使用正则表达式获取更大字符串中的子字符串

1 个答案: