Pandas相关误差 - 十进制和浮点类型不匹配

时间:2016-02-02 16:13:11

标签: python pandas decimal

此问题有been raised here,但尚未得到解答。我在这个帖子中提供了更多细节,希望能让人流淌。

我有一个包含时间序列数据的pandas数据帧master_frame

     SUBMIT_DATE   CRUX_VOL        CRUX_RATE
0     2016-02-01   76.38733173161  0.02832710529
1     2016-01-31   76.68984699154  0.02720243998
2     2016-01-30   75.59094829615  0.02720243998
3     2016-01-29   75.91758975956  0.02720243998
4     2016-01-28   76.31809997200  0.02671927211
...          ...   ...            ...

我想要CRUX_VOLCRUX_RATE列之间的相关性。两者都是十进制类型:

ln[3]: print type(master_frame["CRUX_VOL"][0]), type(master_frame["CRUX_RATE"][0])
out[3]: <class 'decimal.Decimal'> <class 'decimal.Decimal'>

当我使用corr函数时,我得到一个与输入类型相关的讨厌错误。

print master_frame['CRUX_VOL'].corr(master_frame['CRUX_RATE'])

Traceback (most recent call last):
  File "U:/Programming/VolPathReport/VolPath.py", line 52, in <module>
    print master_frame['CRUX_VOL'].corr(master_frame['CRUX_RATE'])
  File "C:\Anaconda2\lib\site-packages\pandas\core\series.py", line 1312, in corr
    min_periods=min_periods)
  File "C:\Anaconda2\lib\site-packages\pandas\core\nanops.py", line 47, in _f
    return f(*args, **kwargs)
  File "C:\Anaconda2\lib\site-packages\pandas\core\nanops.py", line 644, in nancorr
    return f(a, b)
  File "C:\Anaconda2\lib\site-packages\pandas\core\nanops.py", line 652, in _pearson
    return np.corrcoef(a, b)[0, 1]
  File "C:\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 2145, in corrcoef
    c = cov(x, y, rowvar)
  File "C:\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 2065, in cov
    avg, w_sum = average(X, axis=1, weights=w, returned=True)
  File "C:\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 599, in average
    scl = np.multiply(avg, 0) + scl
TypeError: unsupported operand type(s) for +: 'Decimal' and 'float'

我已经弄乱了这些类型,无法让这件事情发挥作用。帮帮我,互联网的巫师!

1 个答案:

答案 0 :(得分:1)

错误消息的最后一行指向

np.multiply(avg, 0) + scl

的原因
TypeError: unsupported operand type(s) for +: 'Decimal' and 'float'

我认为numpy不具有Decimal类型,因此np.multiply会返回float,然后在使用时Decimal不与+合作pandas运营商。由于numpy依赖于DataFrame,因此最好使用

float转换为dtype master_frame.loc[:, ['CRUX_VOL', 'CRUX_RATE']].astype(float)
master_frame.convert_objects(convert_numeric=True)

<camelContext xmlns="http://camel.apache.org/schema/spring">

    <camel:dataFormats>
        <camel:json id="jsonToResult" library="Jackson" unmarshalTypeName="com.trumin.domain.model.result.Result" />
    </camel:dataFormats>


    <camel:route>
        <camel:from
            uri="activemq:topic:result?clientId=sswric_01&amp;durableSubscriptionName=sendSMSWhenResultIsCalculated" />
        <camel:unmarshal ref="jsonToResult" />
        <camel:bean beanType="com.trumin.communications.sms.TimeResultSMSSender"
            method="sendTextToUserAfterTimeResultBeingSaved" />
    </camel:route>
</camelContext>