我需要在日志文件中隔离一个单词并提取以下值。 我一直在阅读正则表达式,但似乎无法理解语法。
我正在从日志文件中读取并收集我需要使用诸如re.findall之类的东西。
我这样做是bash,但无法将其转换为python。
现金代码:
cat FILE | sed -n -e 's/^.*GET //p' | sed -e 's/,.*//g' |sort | uniq -c | sort -n
日志文件摘要:
109.40.2.10 - - [12/May/2019:06:53:40 +0200] "GET /ddo/livesearch?text=tilkn&format=json&app=android&size=30 HTTP/1.1" 200 96 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
109.40.2.10 - - [12/May/2019:06:53:41 +0200] "GET /ddo/livesearch?text=tilk&format=json&app=android&size=30 HTTP/1.1" 200 464 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
109.40.2.10 - - [12/May/2019:06:53:41 +0200] "GET /ddo/livesearch?text=ti&format=json&app=android&size=30 HTTP/1.1" 200 401 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
109.40.2.10 - - [12/May/2019:06:53:41 +0200] "GET /ddo/livesearch?text=t&format=json&app=android&size=30 HTTP/1.1" 200 12 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
109.40.2.10 - - [12/May/2019:06:53:42 +0200] "GET /ddo/livesearch?text=&format=json&app=android&size=30 HTTP/1.1" 200 12 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
109.40.2.10 - - [12/May/2019:06:53:43 +0200] "GET /ddo/livesearch?text=b&format=json&app=android&size=30 HTTP/1.1" 200 12 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
我需要提取的内容: / ddo / *行
答案 0 :(得分:1)
使用re.search
-> lookahead & lookbehind
例如:
import re
s = '''109.40.2.10 - - [12/May/2019:06:53:40 +0200] "GET /ddo/livesearch?text=tilkn&format=json&app=android&size=30 HTTP/1.1" 200 96 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
109.40.2.10 - - [12/May/2019:06:53:41 +0200] "GET /ddo/livesearch?text=tilk&format=json&app=android&size=30 HTTP/1.1" 200 464 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
109.40.2.10 - - [12/May/2019:06:53:41 +0200] "GET /ddo/livesearch?text=ti&format=json&app=android&size=30 HTTP/1.1" 200 401 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
109.40.2.10 - - [12/May/2019:06:53:41 +0200] "GET /ddo/livesearch?text=t&format=json&app=android&size=30 HTTP/1.1" 200 12 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
109.40.2.10 - - [12/May/2019:06:53:42 +0200] "GET /ddo/livesearch?text=&format=json&app=android&size=30 HTTP/1.1" 200 12 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
109.40.2.10 - - [12/May/2019:06:53:43 +0200] "GET /ddo/livesearch?text=b&format=json&app=android&size=30 HTTP/1.1" 200 12 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"'''
for line in s.splitlines():
m = re.search(r'(?<="GET )(?P<path>.*?)(?=HTTP/1.1")', line)
if m:
print(m.group("path"))
输出:
/ddo/livesearch?text=tilkn&format=json&app=android&size=30
/ddo/livesearch?text=tilk&format=json&app=android&size=30
/ddo/livesearch?text=ti&format=json&app=android&size=30
/ddo/livesearch?text=t&format=json&app=android&size=30
/ddo/livesearch?text=&format=json&app=android&size=30
/ddo/livesearch?text=b&format=json&app=android&size=30