我遇到了计算互相关的问题。对于这个赋值,我们应该使用Pandas .corr方法。
我四处寻找但找不到合适的解决方案。
以下是代码。
Top15给出了一个Pandas df。
Top15 = answer_one()
%for testing purposes: - works fine :-(
df = pd.DataFrame({'A': range(4), 'B': [2*i for i in range(4)]})
print(df['A'].corr(df['B']))
Top15['Population']=Top15['Energy Supply']/Top15['Energy Supply per capita']
Top15['Citable docs per Capita']=Top15['Citable documents']/Top15['Population']
% check my data
print(Top15['Energy Supply per capita'])
print(Top15['Citable docs per Capita'])
correlation=Top15['Citable docs per Capita'].corr(Top15['Energy Supply per capita'])
print(correlation)
return correlation
毕竟这应该有效。但不,它没有: - (
这是我得到的输出:( 1.0是来自df。['A]等测试。)
1.0
Country
China 93
United States 286
Japan 149
United Kingdom 124
Russian Federation 214
Canada 296
Germany 165
India 26
France 166
South Korea 221
Italy 109
Spain 106
Iran 119
Australia 231
Brazil 59
Name: Energy Supply per capita, dtype: object
Country
China 9.269e-05
United States 0.000298307
Japan 0.000237714
United Kingdom 0.000318721
Russian Federation 0.000127533
Canada 0.000500002
Germany 0.00020942
India 1.16242e-05
France 0.00020322
South Korea 0.000239392
Italy 0.000180175
Spain 0.00020089
Iran 0.00011442
Australia 0.000374206
Brazil 4.17453e-05
Name: Citable docs per Capita, dtype: object
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-124-942c0cf8a688> in <module>()
22 return correlation
23
---> 24 answer_nine()
<ipython-input-124-942c0cf8a688> in answer_nine()
15 Top15['Citable docs per Capita']=np.float64(Top15['Citable docs per Capita'])
16
---> 17 correlation=Top15['Citable docs per Capita'].corr(Top15['Energy Supply per capita'])
18
19
/opt/conda/lib/python3.5/site-packages/pandas/core/series.py in corr(self, other, method, min_periods)
1392 return np.nan
1393 return nanops.nancorr(this.values, other.values, method=method,
-> 1394 min_periods=min_periods)
1395
1396 def cov(self, other, min_periods=None):
/opt/conda/lib/python3.5/site-packages/pandas/core/nanops.py in _f(*args, **kwargs)
42 f.__name__.replace('nan', '')))
43 try:
---> 44 return f(*args, **kwargs)
45 except ValueError as e:
46 # we want to transform an object array
/opt/conda/lib/python3.5/site-packages/pandas/core/nanops.py in nancorr(a, b, method, min_periods)
676
677 f = get_corr_func(method)
--> 678 return f(a, b)
679
680
/opt/conda/lib/python3.5/site-packages/pandas/core/nanops.py in _pearson(a, b)
684
685 def _pearson(a, b):
--> 686 return np.corrcoef(a, b)[0, 1]
687
688 def _kendall(a, b):
/opt/conda/lib/python3.5/site-packages/numpy/lib/function_base.py in corrcoef(x, y, rowvar, bias, ddof)
2149 # nan if incorrect value (nan, inf, 0), 1 otherwise
2150 return c / c
-> 2151 return c / sqrt(multiply.outer(d, d))
2152
2153
AttributeError: 'float' object has no attribute 'sqrt'
对不起但到现在为止,我不知道出错了,为什么它不起作用。
有人能指出我的解决方案吗?
感谢。
编辑: 基本数据框看起来像这样(前两行+标题):
Rank Documents Citable documents Citations Self-citations Citations per document H index 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Energy Supply Energy Supply per capita % Renewable
Country
China 1 127050 126767 597237 411683 4.70 138 3.992331e+12 4.559041e+12 4.997775e+12 5.459247e+12 6.039659e+12 6.612490e+12 7.124978e+12 7.672448e+12 8.230121e+12 8.797999e+12 1.271910e+11 93 19.754910
United States 2 96661 94747 792274 265436 8.20 230 1.479230e+13 1.505540e+13 1.501149e+13 1.459484e+13 1.496437e+13 1.520402e+13 1.554216e+13 1.577367e+13 1.615662e+13 1.654857e+13 9.083800e+10 286 11.570980
Japan 3 30504 30287 223024 61554 7.31 134 5.496542e+12 5.617036e+12 5.558527e+12 5.251308e+12 5.498718e+12 5.473738e+12 5.569102e+12 5.644659e+12 5.642884e+12 5.669563e+12 1.898400e+10 149 10.232820
答案 0 :(得分:0)
这样做了:
correlation = Top15['Citable docs perCapita']\
.astype('float64').corr(Top15['Energy Supply per capita']\
.astype('float64'))
感谢@Shpionus指出其他帖子。