Question

我有一个看起来像这样的数据框，我想计算 columnB 与 columnA 的百分比。在 columnB 的这个例子中，我有 3 个与 columnA 的值相同的值

   columnA   columnB 
0  A         None    
1  H         H           <---
2  A         A           <---
3  H         H           <---
4  A         H

预期结果：

   columnB 
0  75%

保持健康！

编辑：我刚刚注意到在我的用例中，我想忽略包含“无”值的行。我希望结果是 75% 或 75%。

Answer 1

获取百分比：

perc = df["columnA"].eq(df["columnB"]).sum() / len(df) * 100
print(perc)

打印：

60.0

作为数据框：

df_out = pd.DataFrame(
    {"ColumnB": [df["columnA"].eq(df["columnB"]).sum() / len(df) * 100]}
)
print(df_out)

打印：

   ColumnB
0     60.0

Answer 2

要获得该格式的确切输出，请使用：

new_df = df.replace({'None': None}).dropna()
result = (
    new_df[['columnB']].eq(new_df['columnA'], axis=0)
        .mean().mul(100)
        .to_frame().T.applymap('{:.0f}%'.format)
)

假设 None 值已经被 python None 或 NaN, and not the string 'None'` 使用：

new_df = df.dropna()
result = (
    new_df[['columnB']].eq(new_df['columnA'], axis=0)
        .mean().mul(100)
        .to_frame().T.applymap('{:.0f}%'.format)
)

result：

  columnB
0     75%

假设只使用值：

new_df = df.replace({'None': None}).dropna()
result = new_df['columnB'].eq(new_df['columnA']).mean() * 100

75.0

完整的工作示例：

import pandas as pd

df = pd.DataFrame({'columnA': ['A', 'H', 'A', 'H', 'A'],
                   'columnB': ['None', 'H', 'A', 'H', 'H']})

new_df = df.replace({'None': None}).dropna()
result = (
    new_df[['columnB']].eq(new_df['columnA'], axis=0)
        .mean().mul(100)
        .to_frame().T.applymap('{:.0f}%'.format)
)

print(result)

将 columnA 与 columnB 进行比较并获取特定列的百分比

2 个答案: