我正在用Python编写一个自动化脚本,该脚本利用了另一个库。我得到的输出包含我需要的数组,但是输出还包括无关紧要的字符串格式的日志消息。
为使脚本正常工作,我只需要检索文件中的数组。
这是我得到的输出示例。
Split /adclix.$~image into 2 rules
Split /mediahosting.engine$document,script into 2 rules
[
{
"action": {
"type": "block"
},
"trigger": {
"url-filter": "/adservice\\.",
"unless-domain": [
"adservice.io"
]
}
}
]
Generated a total of 1 rules (1 blocks, 0 exceptions)
我如何仅从此文件中获取阵列?
FWIW,我宁愿没有基于数组外部字符串的逻辑,因为它们可能会发生变化。
更新:我从中获取数据的脚本在这里:https://github.com/brave/ab2cb/tree/master/ab2cb
我的完整代码在这里:
def pipe_in(process, filter_lists):
try:
for body, _, _ in filter_lists:
process.stdin.write(body)
finally:
process.stdin.close()
def write_block_lists(filter_lists, path, expires):
block_list = generate_metadata(filter_lists, expires)
process = subprocess.Popen(('ab2cb'),
cwd=ab2cb_dirpath,
stdin=subprocess.PIPE, stdout=subprocess.PIPE)
threading.Thread(target=pipe_in, args=(process, filter_lists)).start()
result = process.stdout.read()
with open('output.json', 'w') as destination_file:
destination_file.write(result)
destination_file.close()
if process.wait():
raise Exception('ab2cb returned %s' % process.returncode)
理想情况下,将在stdout中修改输出,然后稍后将其写入文件,因为我仍然需要修改前面提到的数组中的数据。
答案 0 :(得分:0)
我为此目的编写了一个库。我很少插上它!
from jsonfinder import jsonfinder
logs = r"""
Split /adclix.$~image into 2 rules
Split /mediahosting.engine$document,script into 2 rules
[
{
"action": {
"type": "block"
},
"trigger": {
"url-filter": "/adservice\\.",
"unless-domain": [
"adservice.io"
]
}
}
]
Generated a total of 1 rules (1 blocks, 0 exceptions)
Something else that looks like JSON: [1, 2]
"""
for start, end, obj in jsonfinder(logs):
if (
obj
and isinstance(obj, list)
and isinstance(obj[0], dict)
and {"action", "trigger"} <= obj[0].keys()
):
print(obj)
演示:https://repl.it/repls/ImperfectJuniorBootstrapping
库:https://github.com/alexmojaki/jsonfinder
使用pip install jsonfinder
安装。
答案 1 :(得分:0)
您也可以使用正则表达式
import re
input = """
Split /adclix.$~image into 2 rules
Split /mediahosting.engine$document,script into 2 rules
[
{
"action": {
"type": "block"
},
"trigger": {
"url-filter": "/adservice\\.",
"unless-domain": [
"adservice.io"
]
}
}
]
Generated a total of 1 rules (1 blocks, 0 exceptions)
asd
asd
"""
regex = re.compile(r"\[(.|\n)*(?:^\]$)", re.M)
x = re.search(regex, input)
print(x.group(0))
编辑
re.M 启用“多行匹配”