我想在my_list
中计算每列相互之间的协方差。该公式位于函数def covariance_formula(...):
我的代码如下:
#!/usr/bin/python3
import pandas as pd
import numpy as np
my_list = ['A', 'B', 'C', 'D', 'E']
def create_df():
return pd.DataFrame(np.random.randint(0,100,size=(5, 5)).astype(float), columns=my_list)
def iterate_list(df):
for i in range(len(my_list)):
for j in range(i + 1, len(my_list)):
column_one = my_list[i]
column_two = my_list[j]
col_name = column_one + " vs." + column_two
column_1_value = df[df.columns[df.columns.str.startswith(column_one)]]
column_2_value = df[df.columns[df.columns.str.startswith(column_two)]]
column_1_mean = df[df.columns[df.columns.str.startswith(column_one)]].mean(axis=0)
column_2_mean = df[df.columns[df.columns.str.startswith(column_two)]].mean(axis=0)
df2[col_name] = covariance_formula(column_1_value, column_2_value, column_1_mean, column_2_mean)
return df2
def covariance_formula(a, b, mean_a, mean_b):
covar = (a - mean_a) * (b - mean_b)
return covar
def main():
df = create_df()
# print(df) ## see OUTPUT A
df2 = iterate_list(df) ## <<< THIS IS WHERE I AM HAVING MY PROBLEM
# print(df2) ## see EXPECTED OUTPUT B
print(df2)
if __name__ == "__main__":
main()
问题:
如何创建一个新的df df2
,其输出将在 EXPECTED OUTPUT B 中?有没有更快的方法呢?
当前问题:
我面临的当前问题是我似乎无法摆脱这个:
NameError:未定义名称'df2'
我尝试过的事情:
输出A :
A B C D E
0 87.0 92.0 66.0 8.0 67.0
1 84.0 18.0 9.0 80.0 41.0
2 38.0 24.0 53.0 25.0 14.0
3 87.0 25.0 19.0 5.0 0.0
4 91.0 69.0 55.0 14.0 90.0
预期输出B :
A vs.B A vs.C A vs.D A vs.E B vs.C B vs.D B vs.E C vs.D C vs.E D vs.E
0 445.4 245.8 -176.6 236.2 1187.8 -853.8 1141.4 -471.0 629.8 -452.6
1 -182.2 -207.2 353.8 -9.2 866.6 -1479.4 38.6 -1683.0 44.0 -75.0
2 851.0 -496.4 55.2 1119.0 -272.2 30.2 613.4 -17.6 -357.8 39.8
3 -197.8 -205.4 -205.4 -407.0 440.8 440.8 873.4 458.0 907.4 907.4
4 318.2 198.6 -168.6 647.4 341.6 -290.2 1113.8 -181.0 695.0 -590.2
答案 0 :(得分:2)
如果您使用itertools.combinations()
和dict comprehension来构建列,则可以更轻松地执行此操作:
def build_covars(covar_df):
columns = {i + " vs." + j: covariance_formula(covar_df[i], covar_df[j])
for i, j in it.combinations(covar_df.columns, 2)}
return pd.concat(columns, axis=1)
import itertools as it
import pandas as pd
def build_covars(covar_df):
columns = {i + " vs." + j: covariance_formula(covar_df[i], covar_df[j])
for i, j in it.combinations(covar_df.columns, 2)}
return pd.concat(columns, axis=1)
def covariance_formula(a, b):
return (a - a.mean()) * (b - b.mean())
my_list = ['A', 'B', 'C', 'D', 'E']
def create_df():
return pd.DataFrame(
np.random.randint(0, 100, size=(5, 5)).astype(float),
columns=my_list)
df = create_df()
print(build_covars(df))
A vs.B A vs.C A vs.D A vs.E B vs.C B vs.D B vs.E C vs.D C vs.E \
0 52.48 49.92 -43.52 323.84 63.96 -55.76 414.92 -53.04 394.68
1 127.68 123.12 184.68 18.24 120.96 181.44 17.92 174.96 17.28
2 175.48 124.12 -17.12 98.44 47.56 -6.56 37.72 -4.64 26.68
3 10.08 -127.68 -57.12 -280.56 -18.24 -8.16 -40.08 103.36 507.68
4 1370.88 437.92 85.68 1113.84 264.96 51.84 673.92 16.56 215.28
D vs.E
0 -344.08
1 25.92
2 -3.68
3 227.12
4 42.12