Question

我已经研究了先前的similar questions，但找不到任何适用的潜在客户：

我有一个数据框，称为“ df”，其结构大致如下：

    Income  Income_Quantile Score_1 Score_2 Score_3
0   100000  5                75        75    100
1   97500   5                80        76    94
2   80000   5                79        99    83
3   79000   5                88        78    91
4   70000   4                55        77    80
5   66348   4                65        63    57
6   67931   4                60        65    57
7   69232   4                65        59    62
8   67948   4                64        64    60
9   50000   3                66        50    60
10  49593   3                58        51    50
11  49588   3                58        54    50
12  48995   3                59        59    60
13  35000   2                61        50    53
14  30000   2                66        35    77
15  12000   1                22        60    30
16  10000   1                15        45    12

使用“ Income_Quantile”列和下面的“ for-loop”，我将数据框分为5个子数据框的列表（每个子数据框包含来自相同收入分位数的观察值）：

dfs = []

for level in df.Income_Quantile.unique():
    df_temp = df.loc[df.Income_Quantile == level]
    dfs.append(df_temp)

现在，我想应用以下函数来计算数据框的spearman相关性，p值和t统计量（仅供参考：主要函数中使用scipy.stats函数）：

def create_list_of_scores(df):

    df_result = pd.DataFrame(columns=cols)
    df_result.loc['t-statistic'] = [ttest_ind(df['Income'], df[x])[0] for x in cols]
    df_result.loc['p-value'] = [ttest_ind(df['Income'], df[x])[1] for x in cols]
    df_result.loc['correlation'] = [spearmanr(df['Income'], df[x])[1] for x in cols]

    return df_result

可以从scipy.stats访问“ create_list_of_scores”使用的函数，即“ ttest_ind”和“ ttest_ind”，如下所示：

从scipy.stats导入ttest_ind
从scipy.stats导入spearmanr

我在数据框的一个子集上测试了该功能：

data = dfs[1]
result = create_list_of_scores(data)

它按预期工作。

但是，将功能应用于数据框“ dfs”的整个列表时，会出现很多问题。如果我将其应用于数据框列表，如下所示：

result = pd.concat([create_list_of_scores(d) for d in dfs], axis=1)

我得到的输出为“ Score_1，Score_2和Score_3”列x5。

我想：

只有三列“ Score_1，Score_2和Score_3”。
使用t统计量，p值和相关性作为第一级索引来为输出建立索引，并且； “ Income_Quantile”作为第二级索引。

这就是我要记住的：

                  Score_1  Score_2  Score_3
t-statistic 1           
            2           
            3           
            4           
            5           
p-value     1           
            2           
            3           
            4           
            5           
correlation 1           
            2           
            3           
            4           
            5

关于如何按要求合并函数输出的任何想法吗？

Answer 1

我认为最好使用GroupBy.apply：

.tsv

熊猫数据框列表：合并功能输出

1 个答案: