Question

我有两个数据帧，即sarc和non。在两者上都运行describe()之后，我想比较两个数据帧中特定列的平均值。我使用了.loc()并尝试将值保存为浮点型，但它另存为数据帧，这使我无法使用>运算符比较两个值。这是我的代码：

sarc.describe()
        label        c_len    c_s_l_len        score
count  5092.0  5092.000000  5092.000000  5092.000000
mean      1.0    54.876277    33.123527     6.919874
std       0.0    37.536986    22.566558    43.616977
min       1.0     0.000000     0.000000   -96.000000
25%       1.0    29.000000    18.000000     1.000000
50%       1.0    47.000000    28.000000     2.000000
75%       1.0    71.000000    43.000000     5.000000
max       1.0   466.000000   307.000000  2381.000000

non.describe()
        label        c_len    c_s_l_len        score
count  4960.0  4960.000000  4960.000000  4960.000000
mean      0.0    55.044153    33.100806     6.912298
std       0.0    47.873732    28.738776    39.216049
min       0.0     0.000000     0.000000  -119.000000
25%       0.0    23.000000    14.000000     1.000000
50%       0.0    43.000000    26.000000     2.000000
75%       0.0    74.000000    44.000000     4.000000
max       0.0   594.000000   363.000000  1534.000000

non_c_len_mean = non.describe().loc[['mean'], ['c_len']].astype(np.float64) 
sarc_c_len_mean = sarc.describe().loc[['mean'], ['c_len']].astype(np.float64)

if sarc_c_len_mean > non_c_len_mean:
    # do stuff

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

变量确实是<class 'pandas.core.frame.DataFrame'>类型的变量，每个变量都打印为带标签的1行，1-col df，而不仅仅是值。如何只选择数值作为浮点数？

Answer 1

当您选择[]和.loc时，请删除columns中的index

non.describe().loc['mean', 'c_len']

大熊猫使用.loc（）从df中选择单个值会产生df而不是数字

1 个答案: