Question

我正在编写一个脚本来解析我们的一些请求，而且我需要能够在遇到格式错误或不完整的请求时进行处理。例如，典型的请求将采用以下格式：

log-prefix: {JSON request data}\n

所有在一条线上等等......

然后我发现他们的编写器中的字符缓冲区限制为1024，因此请求可以分布在多行上，如下所示：

log-prefix: {First line of data log-prefix: Second line of requests data log-prefix: Final line of log data}\n

我能够通过在我使用的迭代器上调用next来解决这个问题，然后删除前缀，连接请求，然后将其传递给json.loads以返回{ {1}}我需要写入文件。

我是通过以下方式做到的：

dictionary

我在上面的代码中使用的函数是：

lines = (line.strip('\n') for line in inf.readlines())
for line in lines:
    if not line.endswith('}'):
        bad_lines = [line]
        while not line.endswith('}'):
            line = next(lines)
            bad_lines.append(line)
            form_line = malformed_data_handler(bad_lines)
        else:
            form_line = parse_out_json(line)

现在我的问题是，我现在发现了日志数据看起来像这样的实例：

def malformed_data_handler(lines: Sequence) -> dict: """ Takes n malformed lines of bridge log data (where the JSON response has been split across n lines, all containing prefixes) and correctly delegates the parsing to parse_out_json before returning the concatenated result as a dictionary. :param lines: An iterable with malformed lines as the elements :return: A dictionary ready for writing. """ logger.debug('Handling malformed data.') parsed = '' logger.debug(lines) print(lines) for line in lines: logger.info('{}'.format(line)) parsed += parse_out_malformed(line) logger.debug(parsed) return json.loads(parsed, encoding='utf8') def parse_out_json(line: str) -> dict: """ Parses out the JSON response returned from the Apache Bridge logs. Takes a line and removes the prefix, returning a dictionary. :param line: :return: """ data = slice(line.find('{'), None) return json.loads(line[data], encoding='utf8') def parse_out_malformed(line: str) -> str: prefix = 'bridge-rails: ' data = slice(line.find(prefix), None) parsed = line[data].replace(prefix, '') return parsed

我的第一个办法就是添加某种检查以查看是否log-prefix: {First line of data .... log-prefix: Last line of data (No closing brace) log-prefix: {New request}。由于我使用生成器来实现可扩展性来处理线路，我不知道我已经找到了其中一个请求，直到我已经调用了next并将线路拉出线路生成器，并且我无法重新追加它，而且我不确定如何有效地告诉我的流程然后从该行开始并继续正常。

如何在日志文件中检查并丢弃无效的多行JSON日志请求？

0 个答案: