Question

我正在用Python编写一个自动化脚本，该脚本利用了另一个库。我得到的输出包含我需要的数组，但是输出还包括无关紧要的字符串格式的日志消息。

为使脚本正常工作，我只需要检索文件中的数组。

这是我得到的输出示例。

Split /adclix.$~image into 2 rules
Split /mediahosting.engine$document,script into 2 rules
[
    {
        "action": {
            "type": "block"
        }, 
        "trigger": {
            "url-filter": "/adservice\\.", 
            "unless-domain": [
                "adservice.io"
            ]
        }
    }
]
Generated a total of 1 rules (1 blocks, 0 exceptions)

我如何仅从此文件中获取阵列？

FWIW，我宁愿没有基于数组外部字符串的逻辑，因为它们可能会发生变化。

更新：我从中获取数据的脚本在这里：https://github.com/brave/ab2cb/tree/master/ab2cb

我的完整代码在这里：

def pipe_in(process, filter_lists):
try:
    for body, _, _ in filter_lists:
        process.stdin.write(body)
finally:
    process.stdin.close()    


def write_block_lists(filter_lists, path, expires):

block_list = generate_metadata(filter_lists, expires)
process = subprocess.Popen(('ab2cb'),
                           cwd=ab2cb_dirpath,
                           stdin=subprocess.PIPE, stdout=subprocess.PIPE)
threading.Thread(target=pipe_in, args=(process, filter_lists)).start()

result = process.stdout.read()
with open('output.json', 'w') as destination_file:
    destination_file.write(result)
    destination_file.close()
if process.wait():
    raise Exception('ab2cb returned %s' % process.returncode)

理想情况下，将在stdout中修改输出，然后稍后将其写入文件，因为我仍然需要修改前面提到的数组中的数据。

Answer 1

我为此目的编写了一个库。我很少插上它！

from jsonfinder import jsonfinder

logs = r"""
Split /adclix.$~image into 2 rules
Split /mediahosting.engine$document,script into 2 rules
[
    {
        "action": {
            "type": "block"
        }, 
        "trigger": {
            "url-filter": "/adservice\\.", 
            "unless-domain": [
                "adservice.io"
            ]
        }
    }
]
Generated a total of 1 rules (1 blocks, 0 exceptions)
Something else that looks like JSON: [1, 2]
"""

for start, end, obj in jsonfinder(logs):
  if (
      obj 
      and isinstance(obj, list)
      and isinstance(obj[0], dict)
      and {"action", "trigger"} <= obj[0].keys()
  ):
    print(obj)

演示：https://repl.it/repls/ImperfectJuniorBootstrapping

库：https://github.com/alexmojaki/jsonfinder

使用pip install jsonfinder安装。

Answer 2

您也可以使用正则表达式

import re

input = """
Split /adclix.$~image into 2 rules
Split /mediahosting.engine$document,script into 2 rules
[
    {
        "action": {
            "type": "block"
        }, 
        "trigger": {
            "url-filter": "/adservice\\.", 
            "unless-domain": [
                "adservice.io"
            ]
        }
    }
]
Generated a total of 1 rules (1 blocks, 0 exceptions)
asd
asd
"""

regex = re.compile(r"\[(.|\n)*(?:^\]$)", re.M)
x = re.search(regex, input)
print(x.group(0))

编辑

re.M 启用“多行匹配”

https://repl.it/repls/InfantileDopeyLink

从文件中获取数组，其中还包括字符串

2 个答案: