Question

我有1000条日志行，我想要做的只是捕获url所以可以将它分配给python中的变量然后使用urlparse来操作它。这是1个日志行：

2015-04-01 01:01:10 0 192.0.0.1 17204100 192.0.0.1 80 words/123 123 WORD http://something-something.domain.com/folder1/folder2/folder/123432523324325_word_word_file.zipuuid=1234533&something=%205920&word=all&_123 - 3 123 "-" "helloworld/1" 1234 "words"; 127.0.0.1, 192.0.0.1; 3"

我想捕捉的只是： http://something-something.domain.com/folder1/folder2/folder/123432523324325_word_word_file.zipuuid=1234533&something=%205920&word=all&_123

我的正则表达式似乎没有停留在空间：

(http://.*)[^\s]

我的想法是，我能够捕捉任何有http：//一直到空间的东西，但出于某些原因，它会在http：//

之后捕获所有内容。

谢谢！

Answer 1

捕获任何具有http：//一直到空格
的内容

https?://\S+

对非空格字符使用\S。

正则表达式有助于匹配日志中的URL

1 个答案: