Question

我有一个corpus.json文件，需要转换为tsv格式。这是一个巨大的文件，看起来像这样：

{'0': {'metadata': {'id': 'fQ3JoXLXxc4', 'title': '| Board Questions | 12 Maths | Equivalence Class | Equivalence Class Board Questions |', 'tags': ['Board Questions', '12 maths', '12 maths Board Questions', 'Previous Year Board Questions', 'Maths Board Questions', 'Board questions based on Equivalence Classes', 'Equivalence Class', 'Equivalence Classes in hindi'], 'description': 'Board Questions, 12 maths, 12 maths Board Questions, Previous Year Board Questions, Maths Board Questions, Board questions based on Equivalence Classes, Equivalence Class, Equivalence Classes in hindi, Equivalence Class for 12 maths, NCERT CBSE XII Maths,'}}, '1': {'subtitles': ' in this video were going to start taking a look at entropy and tropi and more specifically the kind of entropy we are going to be interested in is information entropy information entropy as opposed to another kind of entropy which you may have heard a probably heard of thermodynamic entropy information entropy comes up in the context of information theory there is actually a direct connection with thermodynamic entropy but were not going to address that here so what is entropy what is information entropy well you can think about it sort of intuitively as the uncertainty uncertainty put that in quotes since we dont really have a definition for uncertainty but you can think about it as the uncertainty in a random variable or random quantity or equivalently you can think about it as the information ....and so on

我使用以下代码：

import json
import csv
with open('Downloads/corpus.json') as json_file:  
    j = json.load(json_file)
with open('output.tsv', 'w') as output_file:
    dw = csv.DictWriter(output_file, sorted(j.keys()), delimiter='\t')
    dw.writeheader()
    dw.writerows(j)

我收到以下错误：

 ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-110-a9cb3b17fdd1> in <module>()
      2     dw = csv.DictWriter(output_file, sorted(j.keys()), delimiter='\t')
      3     dw.writeheader()
----> 4     dw.writerows(j)

~/anaconda3/lib/python3.6/csv.py in writerows(self, rowdicts)
    156 
    157     def writerows(self, rowdicts):
--> 158         return self.writer.writerows(map(self._dict_to_list, rowdicts))
    159 
    160 # Guard Sniffer's type checking against builds that exclude complex()

~/anaconda3/lib/python3.6/csv.py in _dict_to_list(self, rowdict)
    146     def _dict_to_list(self, rowdict):
    147         if self.extrasaction == "raise":
--> 148             wrong_fields = rowdict.keys() - self.fieldnames
    149             if wrong_fields:
    150                 raise ValueError("dict contains fields not in fieldnames: "

AttributeError: 'str' object has no attribute 'keys'

此代码中应更改的内容。或者还有其他方法可以做到这一点。

Answer 1

j是类似JSON的对象;这是一本字典。在不确切知道你要做什么的情况下，我认为你不需要py_str=json.dumps(j)，因为这会将类似JSON的dict变回一个字符串（没有键）。

Python json documentation

一些示例交互式终端命令：

>>> import json
>>> py_str = json.loads('{ "a": "b", "c": "d"}')
>>> py_str
{'a': 'b', 'c': 'd'}
>>> json.dumps(py_str)
'{"a": "b", "c": "d"}'
>>> py_str.keys()
dict_keys(['a', 'c'])
>>> json.dumps(py_str)[0]
'{'  # This is the cause of the failure

Answer 2

我不确定我是否遗漏了一些东西，但是在这个区块中：

with open('Downloads/corpus.json') as json_file:  
    j = json.load(json_file)

你j是一个包含JSON数据的字典。但在这一行：

py_str=json.dumps(j)

你将该字典转换为字符串（基本上撤消了你刚刚做的事情）。您看到的错误是声明字符串没有键。

在调用j方法时，您应该使用py_str代替keys()。

Answer 3

您的代码是正确的。唯一的问题是你正在尝试将json dict对象转换回str，如另一个答案中提到的那样根本没有意义。

您希望sorted(py_str[0].keys())取得什么成果？不用[0]尝试。

小细节：您可以使用一个with语句而不是两个：

import json
import csv

with open('output.tsv', 'w') as output_file, open('Downloads/corpus.json') as json_file:
    json_dict = json.load(json_file)
    dw = csv.DictWriter(output_file, sorted(json_dict.keys()), delimiter='\t')
    dw.writeheader()
    dw.writerows(json_dict)

使用python将json格式文件转换为tsv

3 个答案: