当lines = True时,pandas read_json无法识别orient参数

时间:2018-09-12 13:46:29

标签: python json pandas dataframe

我试图将一些信息存储在一系列JSON对象中,每个对象都换行-效果很好:

#Create a string with two JSONs in two 'lines':
example = pd.DataFrame(0, index=['first', 'second'], columns=['third', 'fourth'])
file_string = ''
for i in range(2):
    file_string += example.to_json(orient='table')+'\n'
print(file_string)

Output: {"schema": {"fields":[{"name":"index","type":"string"},{"name":"third","type":"integer"},{"name":"fourth","type":"integer"}],"primaryKey":["index"],"pandas_version":"0.20.0"}, "data": [{"index":"first","third":0,"fourth":0},{"index":"second","third":0,"fourth":0}]}
{"schema": {"fields":[{"name":"index","type":"string"},{"name":"third","type":"integer"},{"name":"fourth","type":"integer"}],"primaryKey":["index"],"pandas_version":"0.20.0"}, "data": [{"index":"first","third":0,"fourth":0},{"index":"second","third":0,"fourth":0}]}

不幸的是,当我尝试使用带有'lines = True'的熊猫read_json从这样的字符串中读取数据时,事情崩溃了。虽然我通常可以读回它:

#Read it the usual way works - but format is incorrect:
print(pd.read_json(file_string, lines=True))
Output:                                                     data                                             schema
0  [{'index': 'first', 'third': 0, 'fourth': 0}, ...  {'fields': [{'name': 'index', 'type': 'string'...
1  [{'index': 'first', 'third': 0, 'fourth': 0}, ...  {'fields': [{'name': 'index', 'type': 'string'...

我无法使用orient ='table'将其作为初始DataFrame读回:

#Read it taking into account the orient='table' fails:
reading = pd.read_json(file_string, lines=True, orient='table')
Traceback (most recent call last):

  File "<ipython-input-104-f05542cc1431>", line 1, in <module>
    reading = pd.read_json(file_string, lines=True, orient='table')

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\json\json.py", line 422, in read_json
    result = json_reader.read()

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\json\json.py", line 526, in read
    self._combine_lines(data.split('\n'))

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\json\json.py", line 546, in _get_object_parser
    obj = FrameParser(json, **kwargs).parse()

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\json\json.py", line 638, in parse
    self._parse_no_numpy()

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\json\json.py", line 864, in _parse_no_numpy
    precise_float=self.precise_float)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\json\table_schema.py", line 298, in parse_table_schema
    col_order = [field['name'] for field in table['schema']['fields']]

TypeError: list indices must be integers or slices, not str

我做错什么了吗?我有一个可以正常工作的版本-读取每一行,然后一次将一行传递给json阅读器,但这非常慢。我希望台词=真正的版本会更快。

0 个答案:

没有答案