我正在尝试将以下JSON读入DataFrame:
[{"col1": 900000000000000000000}]
当我运行pd.read_json('sample.json')
时,我收到错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 366, in read_json
return json_reader.read()
File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 467, in read
obj = self._get_object_parser(self.data)
File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 484, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 576, in parse
self._parse_no_numpy()
File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 793, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None)
ValueError: Value is too big
我尝试了几种不同的方法来定义读取时的dtype,例如:
df = pd.read_json('sample.json', dtype={'col1': np.dtype('object')})
df = pd.read_json('sample.json', dtype={'col1': np.object})
df = pd.read_json('sample.json', dtype={'col1': str})
有趣的是,如果我将输入更改为以下内容,则可以将dtype设置为float64:[{"col1": "900000000000000000000"}]
;但不幸的是,这不是我的意见。
为什么我无法在读取时正确定义dtype?感谢。
答案 0 :(得分:3)
首先,使用json.loads
并加载所有不存在问题的数据(在这种情况下,除了col1
之外的所有内容)。
import json
json_data = '''[{"col1": 900000000000000000000, "col2": "abc"}, {....}]'''
data = json.loads(json_data)
c = list(set(data[0].keys()) - {'col1'})
df = pd.DataFrame.from_records(data, columns=c)
现在,我们必须手动提取col1
的数据,将其转换为dtype=object
Series
,然后添加。
df.insert(0, 'col1', pd.Series([d['col1'] for d in data], dtype=object))
df
col1 col2
0 900000000000000000000 abc