传递子集化的pandas.DataFrame时,numpy.histogram2d会抛出异常

时间:2014-02-04 16:53:24

标签: python numpy pandas

我遇到了pandas数据帧与numpy histogram2d函数交互的问题。特别是当这段代码正常执行时

import numpy
import pandas
df = pandas.DataFrame(np.random.randn(100, 2), columns=list('AB'))
hist, xe, ye = numpy.histogram2d(df["A"], df["B"])

此代码,我使用DataFrame的子集创建直方图失败

    import numpy
    import pandas
    df = pandas.DataFrame(np.random.randn(100, 2), columns=list('AB'))
    dfSubset = pandas.DataFrame(df[df["A"] < 0])
    hist, xe, ye = numpy.histogram2d(dfSubset["A"], dfSubset["B"])

出现以下异常

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-763e2355a7e1> in <module>()
      1 dfSubset = pandas.DataFrame(df[df["A"] < 0])
----> 2 hist, xe, ye = numpy.histogram2d(dfSubset["A"], dfSubset["B"])

/home/mark/.virtualenvs/ipython/lib/python2.6/site-packages/numpy/lib/twodim_base.pyc in histogram2d(x, y, bins, range, normed, weights)
    651         xedges = yedges = asarray(bins, float)
    652         bins = [xedges, yedges]
--> 653     hist, edges = histogramdd([x, y], bins, range, normed, weights)
    654     return hist, edges[0], edges[1]
    655 

/home/mark/.virtualenvs/ipython/lib/python2.6/site-packages/numpy/lib/function_base.pyc in histogramdd(sample, bins, range, normed, weights)
    312             smax = ones(D)
    313         else:
--> 314             smin = atleast_1d(array(sample.min(0), float))
    315             smax = atleast_1d(array(sample.max(0), float))
    316     else:

/home/mark/.virtualenvs/ipython/lib/python2.6/site-packages/numpy/core/_methods.pyc in _amin(a, axis, out, keepdims)
     19 def _amin(a, axis=None, out=None, keepdims=False):
     20     return um.minimum.reduce(a, axis=axis,
---> 21                             out=out, keepdims=keepdims)
     22 
     23 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):

/home/mark/.virtualenvs/ipython/lib/python2.6/site-packages/pandas/core/generic.pyc in __nonzero__(self)
    663         raise ValueError("The truth value of a {0} is ambiguous. "
    664                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 665                          .format(self.__class__.__name__))
    666 
    667     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我从一些搜索中得知,python容器的真值应返回的是一个有争议的问题,并且大熊猫和numpy期望这种行为是不匹配的。我不知道的是如何将问题解决为实际问题。

有人可以建议解决这个问题吗?

我正在运行带有iPython笔记本的python 2.6.6以及我正在运行的虚拟环境中的以下软件包:

Babel==0.9.4
Beaker==1.3.1
Jinja2==2.2.1
Magic-file-extensions==0.1
Mako==0.3.4
MarkupSafe==0.9.2
OpenEye-python2.6-redhat-6-x64==2013.10.3
PIL==1.1.6
Pygments==1.1.1
SSSDConfig==1.9.2
Sphinx==0.6.6
argparse==1.2.1
backports.ssl-match-hostname==3.4.0.2
cas==0.15
cups==1.0
cupshelpers==1.0
decorator==3.0.1
docutils==0.6
ethtool==0.6
firstboot==1.110
freeipa==2.0.0.alpha.0
git-remote-helpers==0.1.0
iniparse==0.3.1
iotop==0.3.2
ipapython==3.0.0
ipython==1.1.0
iwlib==1.0
kerberos==1.0
lxml==2.2.3
matplotlib==1.1.1
netaddr==0.7.5
nose==0.10.4
numpy==1.8.0
pandas==0.13.0
paramiko==1.7.5
patsy==0.2.1
pyOpenSSL==0.10
pycrypto==2.0.1
pycurl==7.19.0
pygpgme==0.1
python-dateutil==2.2
python-default-encoding==0.1
python-ldap==2.3.10
python-meh==0.11
python-nss==0.11
pytz==2013.9
pyxdg==0.18
pyzmq==14.0.1
qpid-python==0.14
qpid-tools==0.14
scdate==1.9.60
scikit-learn==0.14.1
scipy==0.13.2
sckdump==2.0.5
scservices==0.99.45
scservices.dbus==0.99.45
six==1.5.2
slip==0.2.20
slip.dbus==0.2.20
slip.gtk==0.2.20
smbc==1.0
stevedore==0.13
sympy==0.7.4.1
tornado==3.2
urlgrabber==3.9.1
virtinst==0.600.0
virtualenv==1.11.1
virtualenv-clone==0.2.4
virtualenvwrapper==4.2
yum-metadata-parser==1.1.2

谢谢!

1 个答案:

答案 0 :(得分:3)

更改行:

hist, xe, ye = numpy.histogram2d(dfSubset["A"], dfSubset["B"])

为:

hist, xe, ye = numpy.histogram2d(dfSubset["A"].values, dfSubset["B"].values)

将系列强制转换为numpy数组