我有两个数据帧,即sarc和non。在两者上都运行describe()
之后,我想比较两个数据帧中特定列的平均值。我使用了.loc()
并尝试将值保存为浮点型,但它另存为数据帧,这使我无法使用>
运算符比较两个值。这是我的代码:
sarc.describe()
label c_len c_s_l_len score
count 5092.0 5092.000000 5092.000000 5092.000000
mean 1.0 54.876277 33.123527 6.919874
std 0.0 37.536986 22.566558 43.616977
min 1.0 0.000000 0.000000 -96.000000
25% 1.0 29.000000 18.000000 1.000000
50% 1.0 47.000000 28.000000 2.000000
75% 1.0 71.000000 43.000000 5.000000
max 1.0 466.000000 307.000000 2381.000000
non.describe()
label c_len c_s_l_len score
count 4960.0 4960.000000 4960.000000 4960.000000
mean 0.0 55.044153 33.100806 6.912298
std 0.0 47.873732 28.738776 39.216049
min 0.0 0.000000 0.000000 -119.000000
25% 0.0 23.000000 14.000000 1.000000
50% 0.0 43.000000 26.000000 2.000000
75% 0.0 74.000000 44.000000 4.000000
max 0.0 594.000000 363.000000 1534.000000
non_c_len_mean = non.describe().loc[['mean'], ['c_len']].astype(np.float64)
sarc_c_len_mean = sarc.describe().loc[['mean'], ['c_len']].astype(np.float64)
if sarc_c_len_mean > non_c_len_mean:
# do stuff
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
变量确实是<class 'pandas.core.frame.DataFrame'>
类型的变量,每个变量都打印为带标签的1行,1-col df,而不仅仅是值。如何只选择数值作为浮点数?
答案 0 :(得分:1)
当您选择[]
和.loc
时,请删除columns
中的index
non.describe().loc['mean', 'c_len']