两个Pandas数据帧列之间的相关性:为什么它不起作用?

时间:2017-11-02 11:40:16

标签: python pandas dataframe correlation

我遇到了计算互相关的问题。对于这个赋值,我们应该使用Pandas .corr方法。

我四处寻找但找不到合适的解决方案。

以下是代码。

Top15给出了一个Pandas df。

   Top15 = answer_one()

    %for testing purposes: - works fine :-( 
    df = pd.DataFrame({'A': range(4), 'B': [2*i for i in range(4)]})
    print(df['A'].corr(df['B']))

    Top15['Population']=Top15['Energy Supply']/Top15['Energy Supply per capita']

    Top15['Citable docs per Capita']=Top15['Citable documents']/Top15['Population']

    % check my data    
    print(Top15['Energy Supply per capita'])
    print(Top15['Citable docs per Capita'])

    correlation=Top15['Citable docs per Capita'].corr(Top15['Energy Supply per capita'])
    print(correlation)
    return correlation

毕竟这应该有效。但不,它没有: - (

这是我得到的输出:( 1.0是来自df。['A]等测试。)

1.0
Country
China                  93
United States         286
Japan                 149
United Kingdom        124
Russian Federation    214
Canada                296
Germany               165
India                  26
France                166
South Korea           221
Italy                 109
Spain                 106
Iran                  119
Australia             231
Brazil                 59
Name: Energy Supply per capita, dtype: object
Country
China                   9.269e-05
United States         0.000298307
Japan                 0.000237714
United Kingdom        0.000318721
Russian Federation    0.000127533
Canada                0.000500002
Germany                0.00020942
India                 1.16242e-05
France                 0.00020322
South Korea           0.000239392
Italy                 0.000180175
Spain                  0.00020089
Iran                   0.00011442
Australia             0.000374206
Brazil                4.17453e-05
Name: Citable docs per Capita, dtype: object
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-124-942c0cf8a688> in <module>()
     22     return correlation
     23 
---> 24 answer_nine()

<ipython-input-124-942c0cf8a688> in answer_nine()
     15     Top15['Citable docs per Capita']=np.float64(Top15['Citable docs per Capita'])
     16 
---> 17     correlation=Top15['Citable docs per Capita'].corr(Top15['Energy Supply per capita'])
     18 
     19 

/opt/conda/lib/python3.5/site-packages/pandas/core/series.py in corr(self, other, method, min_periods)
   1392             return np.nan
   1393         return nanops.nancorr(this.values, other.values, method=method,
-> 1394                               min_periods=min_periods)
   1395 
   1396     def cov(self, other, min_periods=None):

/opt/conda/lib/python3.5/site-packages/pandas/core/nanops.py in _f(*args, **kwargs)
     42                                     f.__name__.replace('nan', '')))
     43             try:
---> 44                 return f(*args, **kwargs)
     45             except ValueError as e:
     46                 # we want to transform an object array

/opt/conda/lib/python3.5/site-packages/pandas/core/nanops.py in nancorr(a, b, method, min_periods)
    676 
    677     f = get_corr_func(method)
--> 678     return f(a, b)
    679 
    680 

/opt/conda/lib/python3.5/site-packages/pandas/core/nanops.py in _pearson(a, b)
    684 
    685     def _pearson(a, b):
--> 686         return np.corrcoef(a, b)[0, 1]
    687 
    688     def _kendall(a, b):

/opt/conda/lib/python3.5/site-packages/numpy/lib/function_base.py in corrcoef(x, y, rowvar, bias, ddof)
   2149         # nan if incorrect value (nan, inf, 0), 1 otherwise
   2150         return c / c
-> 2151     return c / sqrt(multiply.outer(d, d))
   2152 
   2153 

AttributeError: 'float' object has no attribute 'sqrt'

对不起但到现在为止,我不知道出错了,为什么它不起作用。

有人能指出我的解决方案吗?

感谢。

编辑: 基本数据框看起来像这样(前两行+标题):

Rank    Documents   Citable documents   Citations   Self-citations  Citations per document  H index 2006    2007    2008    2009    2010    2011    2012    2013    2014    2015    Energy Supply   Energy Supply per capita    % Renewable
Country                                                                             
China   1   127050  126767  597237  411683  4.70    138 3.992331e+12    4.559041e+12    4.997775e+12    5.459247e+12    6.039659e+12    6.612490e+12    7.124978e+12    7.672448e+12    8.230121e+12    8.797999e+12    1.271910e+11    93  19.754910
United States   2   96661   94747   792274  265436  8.20    230 1.479230e+13    1.505540e+13    1.501149e+13    1.459484e+13    1.496437e+13    1.520402e+13    1.554216e+13    1.577367e+13    1.615662e+13    1.654857e+13    9.083800e+10    286 11.570980
Japan   3   30504   30287   223024  61554   7.31    134 5.496542e+12    5.617036e+12    5.558527e+12    5.251308e+12    5.498718e+12    5.473738e+12    5.569102e+12    5.644659e+12    5.642884e+12    5.669563e+12    1.898400e+10    149 10.232820

1 个答案:

答案 0 :(得分:0)

这样做了:

correlation = Top15['Citable docs perCapita']\
         .astype('float64').corr(Top15['Energy Supply per capita']\
         .astype('float64'))

感谢@Shpionus指出其他帖子。