为什么32位和64位numpy / pandas之间存在差异

时间:2016-08-18 21:04:04

标签: python json pandas numpy 32bit-64bit

我在64位fedora盒子上使用numpy / pandas,在生产中他们推送到32位Centos盒子并用json.dumps命中错误。它正在抛出repr(0) is not Serializable

我尝试在64位Centos上进行测试,它运行得非常好。但是在32位(准确地说是Centos 6.8)它会抛出一个错误。我想知道是否有人曾经遇到过这个问题。

下面是64位Fedora,

Python 2.6.6 (r266:84292, Jun 30 2016, 09:54:10) 
[GCC 5.3.1 20160406 (Red Hat 5.3.1-6)] on linux4
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd

>>> >>> a = pd.DataFrame([{'a':1}])
>>> 
>>> a
   a
0  1
>>> a.to_dict()
{'a': {0: 1}}
>>> import json
>>> json.dumps(a.to_dict())
'{"a": {"0": 1}}'

以下是32位Centos

import json
import pandas as pd

a = pd.DataFrame( [ {'a': 1} ] )
json.dumps(a.to_dict())

Traceback (most recent call last):
  File "sample.py", line 5, in <module>
    json.dumps(a.to_dict())
  File "/usr/lib/python2.6/json/__init__.py", line 230, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.6/json/encoder.py", line 367, in encode
    chunks = list(self.iterencode(o))
  File "/usr/lib/python2.6/json/encoder.py", line 309, in _iterencode
    for chunk in self._iterencode_dict(o, markers):
  File "/usr/lib/python2.6/json/encoder.py", line 275, in _iterencode_dict
    for chunk in self._iterencode(value, markers):
  File "/usr/lib/python2.6/json/encoder.py", line 309, in _iterencode
    for chunk in self._iterencode_dict(o, markers):
  File "/usr/lib/python2.6/json/encoder.py", line 268, in _iterencode_dict
    raise TypeError("key {0!r} is not a string".format(key))
TypeError: key 0 is not a string

这个问题的常见工作是什么?我不能使用json的自定义编码器作为我用来推送这个数据的库需要一个字典,它在内部使用json模块来序列化它并通过网络推送它。

更新:两者兼有的Python版本2.6.6和pandas都是0.16.1

1 个答案:

答案 0 :(得分:3)

我相信这是因为索引是与numpy.intNN不同大小的int,并且这些索引不会从一个转换为另一个。

就像我的64位Python 2.7和Numpy:

>>> isinstance(numpy.int64(5), int)
True
>>> isinstance(numpy.int32(5), int)
False

然后:

>>> json.dumps({numpy.int64(5): '5'})
'{"5": "5"}'
>>> json.dumps({numpy.int32(5): '5'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
TypeError: keys must be a string

您可以尝试将索引更改为numpy.int32numpy.int64int

>>> df = pd.DataFrame( [ {'a': 1}, {'a': 2} ] )
>>> df.index = df.index.astype(numpy.int32)  # perhaps your index was of these?
>>> json.dumps(df.to_dict())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
TypeError: keys must be a string

因此,您可以尝试将索引类型更改为int32int64或只是简单的Python int

>>> df.index = df.index.astype(numpy.int64)
>>> json.dumps(df.to_dict())
'{"a": {"0": 1, "1": 2}}'

>>> df.index = df.index.astype(int)
>>> json.dumps(df.to_dict())
'{"a": {"0": 1, "1": 2}}'