Question

我有一个python脚本，在其中尝试读取目录中的所有.txt文件，并确定它们是否针对我脚本中的任何条件返回True或False。我有成千上万个.txt格式的文本文件。但是，我收到一条错误消息，指出无效的.json格式。我检查过我的文本文件是否为.json格式。我希望脚本确定.txt文件是否与以下代码中的任何语句匹配。然后，我想将结果输出到一个csv文件。非常感激你的帮助！我包含了错误消息和示例.txt文件。

具有.json格式的.txt文件示例

{
    "domain_siblings": [
        "try.wisebuygroup.com.au",
        "www.wisebuygroup.com.au"
    ],
    "resolutions": [
        {
            "ip_address": "34.238.73.135",
            "last_resolved": "2018-04-22 17:59:05"
        },
        {
            "ip_address": "52.0.100.49",
            "last_resolved": "2018-06-24 17:05:06"
        },
        {
            "ip_address": "52.204.226.220",
            "last_resolved": "2018-04-22 17:59:06"
        },
        {
            "ip_address": "52.22.224.230",
            "last_resolved": "2018-06-24 17:05:06"
        }
    ],
    "response_code": 1,
    "verbose_msg": "Domain found in dataset",
    "whois": null
}

错误消息

line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

代码

import os
import json
import csv

path=r'./output/'
csvpath='C:/Users/xxx/Documents/csvtest'
file_n = 'file.csv'

def vt_result_check(path):
    vt_result = False
    for filename in os.listdir(path):
        with open(path + filename, 'r') as vt_result_file:
            vt_data = json.load(vt_result_file)

        # Look for any positive detected referrer samples
        # Look for any positive detected communicating samples
        # Look for any positive detected downloaded samples
        # Look for any positive detected URLs
        sample_types = ('detected_referrer_samples', 'detected_communicating_samples',
                        'detected_downloaded_samples', 'detected_urls')
        vt_result |= any(sample['positives'] > 0 for sample_type in sample_types
                                                 for sample in vt_data.get(sample_type, []))

        # Look for a Dr. Web category of known infection source
        vt_result |= vt_data.get('Dr.Web category') == "known infection source"

        # Look for a Forecepoint ThreatSeeker category of elevated exposure
        # Look for a Forecepoint ThreatSeeker category of phishing and other frauds
        # Look for a Forecepoint ThreatSeeker category of suspicious content
        threats = ("elevated exposure", "phishing and other frauds", "suspicious content")
        vt_result |= vt_data.get('Forcepoint ThreatSeeker category') in threats

    return str(vt_result)


if __name__ == '__main__':
    with open(file_n, 'w') as output:
        for i in range(vt_result_file):
            output.write(vt_result_file, vt_result_check(path))

Answer 1

您正在尝试从空文件（大小为0）解码JSON。检查您的文件路径和该文件的内容。

注意：您在问题中提供的示例是有效的JSON，应该可以毫无问题地加载。

Answer 2

您没有打开文件...

for filename in os.listdir(path):
    with open(path + filename, 'r') as vt_result_file:
        vt_data = json.load(vt_result_file)

listdir-列出路径中的所有 dirs 和文件。

Answer 3

我建议（1）将脚本限制为仅解析.txt文件，以及（2）以try / except语句的形式添加一些基本的错误检查以捕获任何确实发生JSON错误。像这样：

def vt_result_check(path):
    vt_result = False
    for file in os.listdir(path):
        if not file.endswith(".txt"): # skip anything that doesn't end in .txt
            continue

        with open(path + file, 'r') as vt_result_file:
            try:
                vt_data = json.load(vt_result_file)
                # do whatever you want with the json data
            except Exception:
                print("Could not parse JSON file " + file)

您可以围绕此填写其余代码。

读取.txt文件时出现无效的JSON格式错误

3 个答案: