正则表达式在给定单词之前和之后获取单词

时间:2019-12-07 08:33:41

标签: python regex python-3.x

有人可以在Python中为以下字符串提供正则表达式模式吗?我有.log文件,我想从字符串中找到以下行,我必须获取用户和ip。

我希望正则表达式可以使我在from之前一个单词,在from之后一个单词。

Failed password for root from 123.183.209.132 port 39706 ssh2

我想要字符串上方的root123.183.209.132

Failed password for invalid user packer from 13.82.211.217 port 45832 ssh2

我想要字符串上方的packer13.82.211.217

reverse mapping checking getaddrinfo for undefined.datagroup.ua
[93.183.207.5] failed - POSSIBLE BREAK-IN ATTEMPT!

reverse mapping checking getaddrinfo for nsg-static-226.127.71.182.airtel.in [182.71.127.226] failed - POSSIBLE BREAK-IN ATTEMPT!

reverse mapping checking getaddrinfo for 179.185.44.168.static.gvt.net.br [179.185.44.168] failed - POSSIBLE BREAK-IN ATTEMPT!

我想要undefined.datagroup.ua93.183.207.5来自(新正则表达式)。

我的工作代码。

def parse(filename, date=None):
    try:
        # string = 'Failed password for ([a-z]*|[a-z]* [a-z]* [a-z]*) from '
        string = 'Failed password for ([a-z]*|[a-z]* [a-z]* [a-z]*) from [0-9]+(?:\.[0-9]+){3}'
        # string_sub = 'for (?<user>[a-zA-Z\.]+).*?(?<ip>(?:\d{1,3}\.){3}\d{1,3})'
        # string_re = re.compile(r"^[^ ]+ - (C[^ ]*) \[([^ ]+)").match
        match_list =[]
        with open(filename, 'r') as file:
            for line in file:
                for match in re.finditer(string, line, re.S):
                    match_text = match.group()
                    user_ip = re.search(r'Failed password for .*?(\w+) from (\d+(?:\.\d+){3})', match_text)
                    user = user_ip.groups()[0]
        print(user)
    except KeyError as e:
        msg="key %s is missing" % str(e)
        return msg
    except Exception as e:
        return str(e)

我坚持使用正则表达式。

3 个答案:

答案 0 :(得分:0)

对于您的用例而言,正则表达式可能有些过分……例如,您是否尝试过更简单的事情:

s1 = "Failed password for root from 123.183.209.132 port 39706 ssh2"
s2 = "Failed password for invalid user packer from 13.82.211.217 port 45832 ssh2"

parsed = s1.split('from',1)
user = parsed[0].split()[-1]
ip = parsed[1].split()[0]

print(f'User is {user} and IP is {ip}')

答案 1 :(得分:-1)

如果我正确理解,您基本上想要for和该行的ip之后的单词(用户名)吗?如果是这样,怎么办:

for (?<user>[a-zA-Z\.]+).*?(?<ip>(?:\d{1,3}\.){3}\d{1,3})

https://regex101.com/r/aojbyS/1。当然,这是IP的简写形式,但是要使其更正确,您应该使用适当的ipv4 regex

此外,在您的问题中,您没有说什么应该从下面捕获,这可能会修改上面的正则表达式。

Failed password for invalid user packer from 13.82.211.217 port 45832 ssh2.

答案 2 :(得分:-1)

import re

inp = [
    'Failed password for root from 123.183.209.132 port 39706 ssh2',
    'Failed password for invalid user packer from 13.82.211.217 port 45832 ssh2',
    '''reverse mapping checking getaddrinfo for undefined.datagroup.ua
[93.183.207.5] failed - POSSIBLE BREAK-IN ATTEMPT!''',
]
for s in inp:
    result = re.search(r'(?:Failed password|reverse mapping.+?) for .*?([\w.]+)\s+(?:from |\[)(\d+(?:\.\d+){3})', s)
    print result.groups()

输出:

('root', '123.183.209.132')
('packer', '13.82.211.217')
('undefined.datagroup.ua', '93.183.207.5')

说明:

(?:                     # non capture group
    Failed password     # literally
  |                   # OR
    reverse mapping     # literally
    .+?                 # 1 or more any character, not greedy
)                       # end group
 for                    # literally
 .*?                    # 0 or more any character
 ([\w.]+)               # group 1, 1 or more word character or dot
 \s+                    # 1 or more spaces
 (?:from |\[)           # non capture group, from OR opening square bracket
(\d+(?:\.\d+){3})       # group 2, IP