Question

我有一个使用此格式的DNS流量的JSON文件

{
    "index": {
        "_type": "answer_query", 
        "_id": 0, 
        "_index": "index_name"
    }
}

{
    "answer_section": " ", 
    "query_type": "A", 
    "authority_section": "com. 172 IN SOA a.xxxx-xxxx.net. nstld.xxxx-xxxxcom. 1526440480 1800 900 604800 86400", 
    "record_code": "NXDOMAIN", 
    "ip_src": "xx.xx.xx.xx", 
    "response_ip": "xx.xx.xx.xx", 
    "date_time": "2018-05-16T00:57:20Z", 
    "checksum": "CORRECT", 
    "query_name": "xx.xxxx.com.", 
    "port_src": 50223, 
    "question_section": "xx.xxxx.com. IN A", 
    "answer_count_section": 0
}

我需要在authority_section小于300的空间（在示例中，它将是172）中提取数字，并忽略那些不符合要求的数据，然后写输出到另一个JSON文件。

我怎样才能做到这一点？感谢

Answer 1

假设stack1.txt是您发布的文件。这将写一个新的文件stack2.txt，它省略了＆＃34; authority_section＆＃34;如果＆＃34;值超出空格＆＃34;是＆gt; = 300.此解决方案不需要解析json，但它非常依赖于数据格式的一致性。

import os
with open('stack2.txt','w') as new_file:
    old_file = open('stack1.txt').readlines()
    delete_file = False
    for line in old_file:
        if not (line.strip().startswith('"authority_section"') and int(line.split(':')[1].split()[1]) >= 300):
            new_file.write(line)
        else:
            delete_file = True
if delete_file:
    os.remove('stack2.txt')

Answer 2

您可以尝试这样的事情：

extern

我在答案中使用正则表达式。阅读有关正则表达式的更多详细信息。

使用Python

2 个答案: