在Python
中,我正在尝试获取XML
文件,对其进行处理,然后将数据输出到JSON
。 XML处理工作正常,但我无法正确格式化JSON。输出文件看起来更像是一个包含字典的列表,这很有意义,因为这就是代码实际执行的操作。如何将其设为正确的JSON文件?
filename = 'data.json'
d = []
for elem in ET.fromstring(data).findall('.//table/row'):
field1 = elem.get('field1')
field2 = elem.get('field2')
field3 = elem.get('field3')
field4 = elem.get('field4')
l = {'field1' : field1,
'field2' : field2,
'field3' : field3,
'field4' : field4}
d.append(l)
f_out = open(filename, 'w')
json.dump(d, f_out)
f_out.close()
输出文件如下所示:
[{"field1": "field1", "field2": "field2", "field3": "field3", "field4": "field4"}, ... {"field1": "field1", "field2": "field2", "field3": "field3", "field4": "field4"}]
当我希望它看起来像:
{"field1": "field1", "field2": "field2", "field3": "field3", "field4": "field4"}, ... {"field1": "field1", "field2": "field2", "field3": "field3", "field4": "field4"}
答案 0 :(得分:1)
根据the AWS docs,Redshift COPY命令在其输入文件中需要一系列JSON对象,并在其可选的JSONPath文件中需要一系列JSON对象。
要创建此类序列,请多次调用json.dump()
:
from xml.etree import ElementTree as ET
import json
data = '''
<root><table>
<row field1="a" field2="b" field3="c" field4="d"/>
<row field1="1" field2="2" field3="3" field4="4"/>
</table></root>'''
filename = 'data.json'
f_out = open(filename, 'w')
for elem in ET.fromstring(data).findall('.//table/row'):
field1 = elem.get('field1')
field2 = elem.get('field2')
field3 = elem.get('field3')
field4 = elem.get('field4')
l = {'field1' : field1,
'field2' : field2,
'field3' : field3,
'field4' : field4}
json.dump(l, f_out)
f_out.write('\n')
f_out.close()
结果:
{"field2": "b", "field3": "c", "field1": "a", "field4": "d"}
{"field2": "2", "field3": "3", "field1": "1", "field4": "4"}
答案 1 :(得分:0)
json.dump()
有一个缩进和一个分隔符参数,如果你的json文件应该是人类可读的,你可能想要提供它。
示例:
json.dump({'1': 2, '3': 4}, f_out, indent=4, separators=(',', ': '))
结果:
{
"1": 2,
"3": 4
}