TypeError使用Pandas的Pearson相关性

时间:2016-04-29 14:15:12

标签: python pandas

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S

我尝试找到Pearson correlation到列SibSpParch并使用

corr = data.corr(data['SibSp'], data['Parch']) 但它返回TypeError: invalid type comparison

我做错了什么?

2 个答案:

答案 0 :(得分:0)

corr不起作用。您在数据框(corr)上调用df.corr()并计算每对列的相关系数。如果您只想计算SibSpParch之间的相关性,则只能选择这些列并在结果数据框上调用corr

df[["SibSp", "Parch"]].corr()
Out[7]: 
         SibSp    Parch
SibSp  1.00000  0.87831
Parch  0.87831  1.00000

答案 1 :(得分:0)

corr上的DataFrame方法确实会为每个组合生成相关性,但corr上的Series方法需要另一个Series进行关联。所以这也可行:

解决方案

data.SibSp.corr(data.Parch)