给出2个pandas数据帧,
A = pd.DataFrame({'one':['a','a','a','b','b','b'], 'two':[1,2,3,3,2,1]})
B = pd.DataFrame({'one':['a','a','a','b','b','b'], 'two':[4,3,2,2,3,4]})
A
one two
0 a 1
1 a 2
2 a 3
3 b 3
4 b 2
5 b 1
乙
one two
0 a 4
1 a 3
2 a 2
3 b 2
4 b 3
5 b 4
如何同时计算相关性A[A['one']=='a']['two'].corr(B[B['one']['two'] =='a'])
和A[A['one']=='b']['two'].corr(B[B['one']['two'] =='b'])
?最终目标是将相关性绘制为“一个”的函数。列值' a'和' b',即
corr
a -1.0
b -1.0
答案 0 :(得分:1)
迭代这两组的一种方法是:
x, y = A.groupby('one'), B.groupby('one')
res = {i[0]:i[1].two.corr(y.get_group(i[0]).two) for i in x}
pd.DataFrame(res.items())
# 0 1
#0 a -1
#1 b -1
答案 1 :(得分:1)
import pandas as pd
import numpy as np
A = pd.DataFrame({'one':['a','a','a','b','b','b'], 'two':[1,2,3,3,2,1]})
B = pd.DataFrame({'one':['a','a','a','b','b','b'], 'two':[4,3,2,2,3,4]})
A = A.set_index('one').sort_index()
B = B.set_index('one').sort_index()
# as they must have the same number of obs on a or b in both dfs, do horizontal concat
df = pd.concat([A, B], keys=['A', 'B'], axis=1)
def cal_corr(group):
return pd.Series({'corr': group.A.corrwith(group.B).values[0]})
df.groupby(level='one').apply(cal_corr)
Out[211]:
corr
one
a -1
b -1