使用Python从字符串中提取IP

时间:2013-11-29 13:16:06

标签: python regex python-3.3 logfile

我刚看了一下正则表达式,我有点困惑。 我写了一个程序,它可以逐行实时分析“auth.log”文件。 现在我需要从条目中获得不同的信息。

if "sshd" in line
    if "Accepted password" in line
        REGEX Query to get the username and ip
    elif "session closed" in line
        REGEX Query to get the username

这是日志文件中的条目:

Nov 29 13:20:33 Debian sshd[4043]: Accepted password for patrick from ::1 port 50864 ssh2
Nov 29 13:20:33 Debian sshd[4043]: pam_unix(sshd:session): session opened for user patrick by (uid=0)
Nov 29 13:21:23 Debian sshd[4043]: pam_unix(sshd:session): session closed for user patrick

我应该选择哪种工具? re.search?

2 个答案:

答案 0 :(得分:1)

由于日志条目格式正确,因此您可能不需要使用正则表达式:

$ cat t.txt 
Nov 29 13:20:33 Debian sshd[4043]: Accepted password for patrick from ::1 port 50864 ssh2
Nov 29 13:20:33 Debian sshd[4043]: pam_unix(sshd:session): session opened for user patrick by (uid=0)
Nov 29 13:21:23 Debian sshd[4043]: pam_unix(sshd:session): session closed for user patrick
$ cat t.py 
#/usr/bin/env python
for line in open('t.txt'):
    if "sshd" in line:
        if "Accepted password" in line:
            print "User: ", line.split()[8]
            print "IP: ", line.split()[10]
        if "session closed" in line:
            print "User: ", line.split()[10]
$ python t.py 
User:  patrick
IP:  ::1
User:  patrick

当然,您需要对if "sshd" in line:这样的行更加小心,但您明白了。

答案 1 :(得分:0)

以下是我查找IPv6和IPv4地址的方法:

import re
ip6 =   '''(?:(?x)(?:(?:[0-9a-f]{1,4}:){1,1}(?::[0-9a-f]{1,4}){1,6})|
(?:(?:[0-9a-f]{1,4}:){1,2}(?::[0-9a-f]{1,4}){1,5})|
(?:(?:[0-9a-f]{1,4}:){1,3}(?::[0-9a-f]{1,4}){1,4})|
(?:(?:[0-9a-f]{1,4}:){1,4}(?::[0-9a-f]{1,4}){1,3})|
(?:(?:[0-9a-f]{1,4}:){1,5}(?::[0-9a-f]{1,4}){1,2})|
(?:(?:[0-9a-f]{1,4}:){1,6}(?::[0-9a-f]{1,4}){1,1})|
(?:(?:(?:[0-9a-f]{1,4}:){1,7}|:):)|
(?::(?::[0-9a-f]{1,4}){1,7})|
(?:(?:(?:(?:[0-9a-f]{1,4}:){6})(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}))|
(?:(?:(?:[0-9a-f]{1,4}:){5}[0-9a-f]{1,4}:(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}))|
(?:(?:[0-9a-f]{1,4}:){5}:[0-9a-f]{1,4}:(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})|
(?:(?:[0-9a-f]{1,4}:){1,1}(?::[0-9a-f]{1,4}){1,4}:(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})|
(?:(?:[0-9a-f]{1,4}:){1,2}(?::[0-9a-f]{1,4}){1,3}:(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})|
(?:(?:[0-9a-f]{1,4}:){1,3}(?::[0-9a-f]{1,4}){1,2}:(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})|
(?:(?:[0-9a-f]{1,4}:){1,4}(?::[0-9a-f]{1,4}){1,1}:(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})|
(?:(?:(?:[0-9a-f]{1,4}:){1,5}|:):(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})|
(?::(?::[0-9a-f]{1,4}){1,5}:(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}))
'''
ip4 =   '(?:[12]?\\d?\\d\\.){3}[12]?\\d?\\d'
ip = re.findall(ip4 + '|' + ip6, "111:111::1 1.1.1.1")

我从其他网站Regular expression that matches valid IPv6 addresses

获得了IPv6的正则表达式