Question

我想解决与http://re-cassiopeia.cz.vms6119.globenet.cz/类似的问题。

我的解决方案是

在将DataFrame转换为dict时删除NaN值，然后
使用json.dumps()

这是我的代码和错误：

In [9]:df

Out[9]:
    101 102
  a 123 NaN
  b 234 234
  c NaN 456

In [10]:def to_dict_dropna(data):
          return dict((k, v.dropna().to_dict()) for k, v in compat.iteritems(data))

In [47]:k2 = to_dict_dropna(df)
In [48]:k2
Out[48]:{101: {'a': 123.0, 'b': 234.0}, 102: {'b': 234.0, 'c': 456.0}}
In [49]:json.dumps(k2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-76-f0159cf5a097> in <module>()
----> 1 json.dumps(k2)

C:\Python27\lib\json\__init__.pyc in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, encoding, default, sort_keys, **kw)
    241         cls is None and indent is None and separators is None and
    242         encoding == 'utf-8' and default is None and not sort_keys and not kw):
--> 243         return _default_encoder.encode(obj)
    244     if cls is None:
    245         cls = JSONEncoder

C:\Python27\lib\json\encoder.pyc in encode(self, o)
    205         # exceptions aren't as detailed.  The list call should be roughly
    206         # equivalent to the PySequence_Fast that ''.join() would do.
--> 207         chunks = self.iterencode(o, _one_shot=True)
    208         if not isinstance(chunks, (list, tuple)):
    209             chunks = list(chunks)

C:\Python27\lib\json\encoder.pyc in iterencode(self, o, _one_shot)
    268                 self.key_separator, self.item_separator, self.sort_keys,
    269                 self.skipkeys, _one_shot)
--> 270         return _iterencode(o, 0)
    271 
    272 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

TypeError: keys must be a string

但如果我直接初始化一个字典，它就有用了：

In [65]:k = {101: {'a': 123.0, 'b': 234.0}, 102: { 'b': 234.0, 'c': 456.0}}
In [66]:k == k2
Out[66]:True
In [63]:json.dumps(k)
Out[63]:'{"101": {"a": 123.0, "b": 234.0}, "102": {"c": 456.0, "b": 234.0}}'

我的代码出了什么问题？

Answer 1

你的Pandas数据框中的'整数'不是真正的整数。它们是float64个对象，请参阅Pandas Gotchas documentation。

您必须将它们转换回int()个对象，或将它们直接转换为字符串：

def to_dict_dropna(data):
     return {int(k): v.dropna().astype(int).to_dict() for k, v in compat.iteritems(data)}

是前者。

json dumps TypeError：keys必须是带有dict的字符串

1 个答案: