计算单独的相关性,按列值

时间:2015-06-29 21:50:32

标签: python pandas

给出2个pandas数据帧,

A = pd.DataFrame({'one':['a','a','a','b','b','b'], 'two':[1,2,3,3,2,1]})
B = pd.DataFrame({'one':['a','a','a','b','b','b'], 'two':[4,3,2,2,3,4]})

A

  one   two
0   a   1
1   a   2
2   a   3
3   b   3
4   b   2
5   b   1

    one two
0   a   4
1   a   3
2   a   2
3   b   2
4   b   3
5   b   4

如何同时计算相关性A[A['one']=='a']['two'].corr(B[B['one']['two'] =='a'])A[A['one']=='b']['two'].corr(B[B['one']['two'] =='b'])?最终目标是将相关性绘制为“一个”的函数。列值' a'和' b',即

  corr
a  -1.0
b  -1.0

2 个答案:

答案 0 :(得分:1)

迭代这两组的一种方法是:

x, y = A.groupby('one'), B.groupby('one')

res = {i[0]:i[1].two.corr(y.get_group(i[0]).two) for i in x}

pd.DataFrame(res.items())
#   0  1
#0  a -1
#1  b -1

答案 1 :(得分:1)

import pandas as pd
import numpy as np

A = pd.DataFrame({'one':['a','a','a','b','b','b'], 'two':[1,2,3,3,2,1]})
B = pd.DataFrame({'one':['a','a','a','b','b','b'], 'two':[4,3,2,2,3,4]})

A = A.set_index('one').sort_index()
B = B.set_index('one').sort_index()
# as they must have the same number of obs on a or b in both dfs, do horizontal concat
df = pd.concat([A, B], keys=['A', 'B'], axis=1)

def cal_corr(group):
    return pd.Series({'corr': group.A.corrwith(group.B).values[0]})

df.groupby(level='one').apply(cal_corr)

Out[211]: 
     corr
one      
a      -1
b      -1