数据文件为here。
我只是想计算两个数据帧的列之间的成对相关性:
In [7]: import os
In [8]: import pandas as pd
In [9]: import numpy as np
In [10]: from pandas import Series, DataFrame
In [12]: blog_dat = pd.read_table("blogdata.txt", index_col="Blog")
In [13]: blog_dat = blog_dat.astype(float)
In [14]: all(blog_dat.notnull())
Out[14]: True
In [15]: x = DataFrame(np.random.randn(99*4).reshape((99, 4)))
In [16]: pd.expanding_corr(blog_dat.iloc[:, :4], blog_dat.iloc[:, :4], pairwise=True)[-1, :, :]
Out[16]:
china kids music yahoo
china 1.000000 0.053069 0.026599 0.246957
kids 0.053069 1.000000 0.409978 0.094636
music 0.026599 0.409978 1.000000 0.055923
yahoo 0.246957 0.094636 0.055923 1.000000
In [17]: pd.expanding_corr(blog_dat.iloc[:, :4], x, pairwise=True)[-1, :, :]
/usr/local/lib/python3.4/site-packages/pandas/core/index.py:1240: RuntimeWarning: unorderable types: str() < int(), sort order is undefined for incomparable objects
"incomparable objects" % e, RuntimeWarning)
/usr/local/lib/python3.4/site-packages/pandas/core/index.py:1240: RuntimeWarning: unorderable types: int() < str(), sort order is undefined for incomparable objects
"incomparable objects" % e, RuntimeWarning)
/usr/local/lib/python3.4/site-packages/pandas/core/index.py:1254: RuntimeWarning: unorderable types: str() > int(), sort order is undefined for incomparable objects
"incomparable objects" % e, RuntimeWarning)
/usr/local/lib/python3.4/site-packages/pandas/core/index.py:1254: RuntimeWarning: unorderable types: int() > str(), sort order is undefined for incomparable objects
"incomparable objects" % e, RuntimeWarning)
Out[17]:
0 1 2 3
china NaN NaN NaN NaN
kids NaN NaN NaN NaN
music NaN NaN NaN NaN
yahoo NaN NaN NaN NaN
即使我将索引和列名称赋予x
,NaN也不会消失。
答案 0 :(得分:2)
让x
和blog_dat
具有相同的index
:
import pandas as pd
import numpy as np
np.random.seed(1)
blog_dat = pd.read_table("data", sep='\s+')
x = pd.DataFrame(np.random.randn(4*4).reshape((4, 4)),
index=blog_dat.index)
pd.expanding_corr(blog_dat.iloc[:, :4], x, pairwise=True)[-1, :, :]
产量
0 1 2 3
china 0.684896 0.260795 -0.990586 0.281298
kids 0.077209 -0.871448 0.702822 0.241313
music -0.203808 0.071436 0.581267 -0.783753
yahoo -0.630744 0.373339 -0.060623 0.258728
仅提供x
任何索引名称是不够的;它们必须与blog_dat
的索引匹配。