Question

我有以下两个字符串;

line1 = [16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore

line2 = [16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore

我试图抓住这两部分;

"GET /file/ HTTP/1.1" 302
"" 400

基本上两者之间的任何角色都是＆＃34;＆＃34;或者介于两者之间＆＃34;＆＃34;。到目前为止，我已尝试过这个;

regex_example = re.search("\".+?\" [0-9]{3}", line1)
print regex_example.group()

这适用于line1，但是为line2提供了错误。这是由于＆＃39;。＆＃39;匹配任何字符，但如果不存在字符则给出错误。

是否有任何方式可以匹配任何角色或两者之间没有任何内容＆＃34;＆＃34;？

Answer 1

使用.*?代替.+?。

+表示“1或更多”

*表示“0或更多”

Regex101 Demo

如果您想要更高效的正则表达式，请使用否定字符类[^"]而不是惰性量词?。您还应该使用原始字符串标记r和\d作为数字。

r'"[^"]*" \d{3}'

Answer 2

您可以使用：

import re

lines = ['[16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore', '[16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore']

rx = re.compile(r'''
        "[^"]*" # ", followed by anything not a " and a "
        \       # a space
        \d+     # at least one digit
        ''', re.VERBOSE)

matches = [m.group(0) \
            for line in lines \
            for m in rx.finditer(line)]

print(matches)
# ['"GET /file/ HTTP/1.1" 302', '"" 400']

<小时/> 请参阅a demo on ideone.com。

Answer 3

更简单的答案。

    import re
    line1= '[16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore'
    line2='[16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore'

    x=re.search('\](.+)random',line1).group(1)

    y= re.search('\](.+)random', line2).group(1)

    print(x + "\n"+y)

您将获得以下输出

     "GET /file/ HTTP/1.1" 302 
     "" 400

Answer 4

试试这个...... 使用＆＃39; findall＆＃39;取代＆＃39;搜索＆＃39;可能会让您更好地控制您希望如何处理输出。

import re

output = []

logs = '[16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore \
        [16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore'

regex = r'"(.*?)"\s(\d{3})'

value = re.findall(regex, logs)
output.append(value)

print(output)

Answer 5

另一个选择是：

import re
re.sub('\[.*\] ', '', your_string)

这应该替换方括号[]中的任何字符组合，然后替换""中带有空字符串your_string的空格，并返回结果。

例如

for your_string in [line1, line2]:
    print(re.sub('\[.*\] ', '', your_string))

输出

>>>"GET /file/ HTTP/1.1" 302 random stuff ignore'
>>>"" 400 random stuff ignore'

正则表达式匹配任何字符或没有？

5 个答案: