Question

我想从日志文件中提取数据

表示打开文件，

a=open('access.log','rb')
lines = a.readlines()

所以假设行[0] 123.456.678.89 - - [04 / Aug / 2014：12：01：41 +0530]＆＃34; GET /123456789_10.10.20.111 HTTP / 1.1＆＃34; 404 537＆＃34; - ＆＃34; ＆＃34; Wget / 1.14（linux-gnu）＆＃34;

我想从＆＃34; GET /123456789_10.10.20.111 HTTP / 1.1＆＃34;

中仅提取123456789和10.10.20.111

模式将像字符串以/开头，重复数字然后是unserscore然后ip

我试过这项工作，我认为这需要开销

node = re.search(r'\"(.*)\"', line).group(1)
node = node.split(" ")[1]
node,ip = node.split("_")
node = node[1:]
print node,ip

如何用模式获得这个？

Answer 1

你想在一行中做到这一点吗？

nodeip = re.search(r'([\d]{9})_([\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3})', line)

现在您的节点和IP在第1组和第2组中：

print nodeip.group(1), nodeip.group(2)

输出：

123456789 10.10.20.111

python正则表达式从日志文件行中提取字符串

1 个答案: