我试图将一些信息存储在一系列JSON对象中,每个对象都换行-效果很好:
#Create a string with two JSONs in two 'lines':
example = pd.DataFrame(0, index=['first', 'second'], columns=['third', 'fourth'])
file_string = ''
for i in range(2):
file_string += example.to_json(orient='table')+'\n'
print(file_string)
Output: {"schema": {"fields":[{"name":"index","type":"string"},{"name":"third","type":"integer"},{"name":"fourth","type":"integer"}],"primaryKey":["index"],"pandas_version":"0.20.0"}, "data": [{"index":"first","third":0,"fourth":0},{"index":"second","third":0,"fourth":0}]}
{"schema": {"fields":[{"name":"index","type":"string"},{"name":"third","type":"integer"},{"name":"fourth","type":"integer"}],"primaryKey":["index"],"pandas_version":"0.20.0"}, "data": [{"index":"first","third":0,"fourth":0},{"index":"second","third":0,"fourth":0}]}
不幸的是,当我尝试使用带有'lines = True'的熊猫read_json从这样的字符串中读取数据时,事情崩溃了。虽然我通常可以读回它:
#Read it the usual way works - but format is incorrect:
print(pd.read_json(file_string, lines=True))
Output: data schema
0 [{'index': 'first', 'third': 0, 'fourth': 0}, ... {'fields': [{'name': 'index', 'type': 'string'...
1 [{'index': 'first', 'third': 0, 'fourth': 0}, ... {'fields': [{'name': 'index', 'type': 'string'...
我无法使用orient ='table'将其作为初始DataFrame读回:
#Read it taking into account the orient='table' fails:
reading = pd.read_json(file_string, lines=True, orient='table')
Traceback (most recent call last):
File "<ipython-input-104-f05542cc1431>", line 1, in <module>
reading = pd.read_json(file_string, lines=True, orient='table')
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\json\json.py", line 422, in read_json
result = json_reader.read()
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\json\json.py", line 526, in read
self._combine_lines(data.split('\n'))
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\json\json.py", line 546, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\json\json.py", line 638, in parse
self._parse_no_numpy()
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\json\json.py", line 864, in _parse_no_numpy
precise_float=self.precise_float)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\json\table_schema.py", line 298, in parse_table_schema
col_order = [field['name'] for field in table['schema']['fields']]
TypeError: list indices must be integers or slices, not str
我做错什么了吗?我有一个可以正常工作的版本-读取每一行,然后一次将一行传递给json阅读器,但这非常慢。我希望台词=真正的版本会更快。