在子文件 - Python中查找日志文件中的特定行

时间:2014-11-02 21:15:23

标签: python regex hadoop

我有以下Hadoop群集:

==> namenode_32: 14/11/02 02:19:32 INFO namenode.NNStorage: Storage directory /data/1/dfs/nn has been successfully formatted.
==> namenode_32: 14/11/02 02:19:32 INFO namenode.NNStorage: Storage directory /nfsmount/dfs/nn has been successfully formatted.
==> namenode_32: 14/11/02 02:19:32 INFO namenode.FSImage: Saving image file /nfsmount/dfs/nn/current/fsimage.ckpt_0000000000000000000 using no compression
==> namenode_32: 14/11/02 02:19:32 INFO namenode.FSImage: Saving image file /data/1/dfs/nn/current/fsimage.ckpt_0000000000000000000 using no compression
==> namenode_32: 14/11/02 02:19:32 INFO namenode.FSImage: Image file of size 115 saved in 0 seconds.
==> namenode_32: 14/11/02 02:19:32 INFO namenode.FSImage: Image file of size 115 saved in 0 seconds.
==> namenode_32: 14/11/02 02:19:32 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
==> namenode_32: 14/11/02 02:19:32 INFO util.ExitUtil: Exiting with status 0
==> namenode_32: 14/11/02 02:19:32 INFO namenode.NameNode: SHUTDOWN_MSG: 
==> namenode_32: /************************************************************
==> namenode_32: SHUTDOWN_MSG: Shutting down NameNode at ip-10-45-129-157.ec2.internal/10.45.129.157
==> namenode_32: ************************************************************/
==> namenode_32:  * Starting Hadoop namenode: 
==> namenode_32: starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-ip-10-45-129-157.out
==> namenode_32:  * Starting Hadoop secondarynamenode: 
==> namenode_32: starting secondarynamenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-secondarynamenode-ip-10-45-129-157.out
==> namenode_32:  * Starting Hadoop jobtracker: 
==> namenode_32: starting jobtracker, logging to /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-ip-10-45-129-157.out

我试图找到此类群集的ip address。我知道SHUTDOWN_MSG: Shutting down NameNode ...我正在寻找的是私人DNSprivate ip的元组。对于那个关节的例子,我得到了:

(ip-10-45-129-157.ec2.internal, 10.45.129.157)

所以我试过了:

import re
expr = "SHUTDOWN_MSG: Shutting down NameNode at"
s = re.search(expr, log)
>>> print (s.group())
SHUTDOWN_MSG: Shutting down NameNode at

这不是我想要的...如何使用正则表达式生成这样的元组?

3 个答案:

答案 0 :(得分:2)

在搜索字符串后使用多个捕获组:

>>> expr = 'SHUTDOWN_MSG:.+at (.+)/(.+)'
>>> re.search(expr, log).groups()
('ip-10-45-129-157.ec2.internal', '10.45.129.157')

答案 1 :(得分:1)

您可以使用多个捕获组来捕获上下文。

>>> re.search(r'SHUTDOWN_MSG: Shutting down NameNode at (.+)/(.+)', log).groups()
('ip-10-45-129-157.ec2.internal', '10.45.129.157')

您可以将表达式编写为:

>>> re.search(r'SHUTDOWN_MSG:.+at (.+)/(.+)', log).groups()

答案 2 :(得分:0)

使用()

捕获群组
import re
f=open('log_file','r').read()
re.findall("SHUTDOWN_MSG:.+at (.+)/(.+)",f)

re.findall()不会停在第一次找到它会发现它到达文件末尾,所以它会给你所有的匹配