Question

所以我有一条线，

unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786

我希望在HTTP / 1.0＆＃34;之后存储所有内容。（所以这两个数字）进入列表，我将如何使用正则表达式？我已经阅读了关于它们的文档，但它们让我感到困惑。

Answer 1

您可以使用regex101构建符合您需要的正则表达式。

对于您的特定示例，以下RE将起作用：

HTTP\/1.0.(.*$)

说明：

在HTTP 1.0"

之后捕获群组内容

提供输出：

` 200 786`

Answer 2

import re
text = 'unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786'
regex = r'HTTP/1.0".*$'
match = re.search(regex, text)
list_with_numbers = match.groups()[0].split()

Answer 3

您不需要正则表达式，您可以使用内置的str方法。例如，

s = 'unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786'
data = s.partition('HTTP/1.0" ')
nums = data[2].split()
print(nums)

<强>输出

['200', '786']

您也可以使用.split()代替.partition()，但我认为.partition()在这里更自然。请注意，nums中存储的数字是字符串，因此如果您需要对它们进行算术运算，则需要添加转换步骤。

以下是使用.split()代替.partition()将数字字符串转换为整数的示例。

data = s.split('HTTP/1.0"')
nums = [int(u) for u in data[1].split()]
print(nums)

<强>输出

[200, 786]

Answer 4

你必须使用正则表达式吗？如果没有，你可以这样做：

>>> lines = ['unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786']
>>> 
>>> numbers = [line.split()[-2:] for line in lines]
>>> numbers
[['200', '786']]
>>>

这假定＆＃34;最后两个以空格分隔的字符串＆＃34;相当于你想要的。

在列表中的某个单词之后存储所有内容 - 列表 - 正则表达式

4 个答案: