我有一个文本文件,我想读取并提取某个字符串(可以出现几次)。然后我要打印结果。
我要提取的字符串是 Rule MATCH Name 的值。
文本文件示例:
201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/76.bin SCORE: 140 TYPE: EXE AutoUpdates https://www.test.com/files: **Rule MATCH Name**: this_is_test1 SUBSCORE:100 201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/7164.bin SCORE: 140 TYPE: EXE AutoUpdates https://www.test.com/files: **Rule MATCH Name**: this_is_test2 SUBSCORE:90 201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/764.bin SCORE: 140 TYPE: EXE AutoUpdates https://www.test.com/files: **Rule MATCH Name**: this_is_test3 SUBSCORE:15
答案 0 :(得分:0)
您可以使用正则表达式解决此问题。 Regexr是创建和测试正则表达式规则的绝佳网站。
一旦您有了适合自己问题的规则,请加载文件,使用readlines()获取文本,然后使用python的re模块提取值。
我提出了一个快速解决方案(不确定这是否是您要提取的值)
import re
fl = r'201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/76.bin SCORE: 140 TYPE: EXE AutoUpdates https://www.test.com/files: Rule MATCH Name: this_is_test1 SUBSCORE:100 201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/7164.bin SCORE: 140 TYPE: EXE AutoUpdates https://www.test.com/files: Rule MATCH Name: this_is_test2 SUBSCORE:90 201819:34:40Z ubuntu : Info: MODULE: FileScan MESSAGE: Scanning test 201809:34:40Z ubuntu: Alert: MODULE: FileScan MESSAGE: FILE: /test/764.bin SCORE: 140 TYPE: EXE AutoUpdates https://www.test.com/files: Rule MATCH Name: this_is_test3 SUBSCORE:15'
re.findall(r'Rule MATCH Name:\s(\w+)\s', fl)
# ['this_is_test1', 'this_is_test2', 'this_is_test3']
如果从文件中读取:
import re
with open('f.txt') as f:
found = []
for line in f.readlines():
found += re.findall(r'Rule MATCH Name:\s(\w+)\s', line)
print(found) # ['this_is_test1', 'this_is_test2', 'this_is_test3']
答案 1 :(得分:0)
使用称为“搜索”的方法非常容易,请遵循伪代码:
import re
import sys
file = open(sys.argv[2], "r")
for line in file:
if re.search(sys.argv[1], line):
print line,