与Python中的corr()一样的交叉格式DataFrame

时间:2019-11-28 23:30:00

标签: python pandas dataframe

我想生成一个与corr()类似的DataFrame,但是具有不同的格式。

例如,假设我有一个DataFrame

import pandas as pd
pa = pd.DataFrame()

pa['john']=[2,3,4,5,6]
pa['june']=[4,6,7,8,2]
pa['kate']=[3,2,3,4,5]

Pandas具有corr()内置函数,该函数生成新的相关性DataFrame。所以如果我打电话给pa.corr()会返回我

        john    june    kate 
john    1.000000    -0.131306   0.832050 
june    -0.131306    1.000000   -0.437014 
kate    0.832050    -0.437014   1.000000

我想生成一个新的DataFrame,但是使用不同的格式,例如

        john                        june                        kate 
john    formula(john)*formula(john) formula(june)*formula(john) formula(kate)*formula(john)
june    formula(john)*formula(june) formula(june)*formula(june) formula(kate)*formula(june)
kate    formula(john)*formula(kate) formula(june)*formula(kate) formula(kate)*formula(kate)

其中,Formula()在一个DataFrame列上进行计算(例如,可以是Formula(pa ['john']) 我该怎么办?

1 个答案:

答案 0 :(得分:1)

这是一种方法,不确定是否最简单

# random function
def formula(x,y):
   return sum(x*y)
import numpy as np
# create a list with tuples with all columns crossings
l = [(x,y) for x in pa.columns for y in pa.columns]
#[('john', 'john'),
# ('john', 'june'),
# ('john', 'kate'),
# ('june', 'john'),
# ('june', 'june'),
# ('june', 'kate'),
# ('kate', 'john'),
# ('kate', 'june'),
# ('kate', 'kate')]

# create dataframe with all info
# x = first element in tuple = one of pa column name
# y = second element in tuple = one of pa column name
# values = formula(pa[x],pa[y])
df = pd.DataFrame({'x': [el[0] for el in l], 
                   'y': [el[1] for el in l] ,
                   'values':[formula(pa[x],pa[y]) for x,y in l]} )

#   x   y   values
#0  john    john    90
#1  john    june    106
#2  john    kate    74
#3  june    john    106
#4  june    june    169
#5  june    kate    87
#6  kate    john    74
#7  kate    june    87
#8  kate    kate    63


# pivot df to obtain the format you want
table = pd.pivot_table(df, values='values', index=['x'],columns=['y'], aggfunc=np.sum).reset_index()


# y    x    john    june    kate
#0  john    90      106     74
#1  june    106     169     87
#2  kate    74      87      63