Question

我正在编写一个小脚本，它将一个目录中的大量JSON文件合并到一个文件中。麻烦的是，我不完全确定我的数据何时处于哪种状态。 TypeErrors比比皆是。这是脚本;

import glob
import json
import codecs

reader = codecs.getreader("utf-8")

for file in glob.glob("/Users/me/Scripts/BagOfJson/*.json"):
#Aha, as binary here
with open(file, "rb") as infile:
    data = json.load(reader(infile))
    #If I print(data) here, looks like good ol' JSON

    with open("test.json", "wb") as outfile:
        json.dump(data, outfile, sort_keys = True, indent = 2, ensure_ascii = False)
    #Crash

此脚本会导致以下错误;

TypeError: a bytes-like object is required, not 'str'

这是由json.dump行引起的。

天真我只是删除了＆＃39; b＆＃39;在＆＃39; wb＆＃39;为outfile打开。这并没有成功。

对我来说，使用shell进行测试并使用type（）python函数也许这是一个教训。尽管如此，如果有人能够清理这些数据交换背后的逻辑，我仍然会喜欢。我希望它们都可以串起来......

Answer 1

如果这是Python 3，删除b（二进制模式）以在文本模式中打开文件应该可以正常工作。您可能希望明确指定编码：

with open("test.json", "w", encoding='utf8') as outfile:
    json.dump(data, outfile, sort_keys = True, indent = 2, ensure_ascii = False)

而不是依赖默认值。

你不应该真正使用codecs.getreader()。标准open()函数可以正常处理UTF-8文件;只需在文本模式下打开文件并再次指定编码：

import glob
import json

for file in glob.glob("/Users/me/Scripts/BagOfJson/*.json"):
    with open(file, "r", encoding='utf8') as infile:
        data = json.load(infile)
        with open("test.json", "w", encoding='utf8') as outfile:
            json.dump(data, outfile, sort_keys = True, indent = 2, ensure_ascii = False)

以上内容仍将为test.json glob中的每个文件重新创建*.json;你无法将多个JSON文档放在同一个文件中（除非你专门创建JSONLines files，因为你使用的是indent），你在这里没有这样做。

如果要重新格式化glob中的所有JSON文件，您需要写入新文件名并将新文件移回file文件名。

Python I / O：混合数据类型

1 个答案: