如何根据pandas数据框中的多个列获取百分比计数?

时间:2017-12-18 11:11:08

标签: python pandas numpy matrix

我在数据框中有20列。 我在这里列出其中4个例子:

is_guarantee:0或1
hotel_star:0,1,2,3,4,5 order_status:40,60,80
旅程(标签):0,1,2

    is_guarantee  hotel_star  order_status  journey
0              0           5            60        0
1              1           5            60        0
2              1           5            60        0
3              0           5            60        1
4              0           4            40        0
5              0           4            40        1
6              0           4            40        1
7              0           3            60        0
8              0           2            60        0
9              1           5            60        0
10             0           2            60        0
11             0           2            60        0

Click to View Image

但系统需要输入如下格式的出现矩阵来起作用:

Click to View Image

任何人都可以帮忙吗?

df1 = pd.DataFrame(index=range(0,20))
df1['is_guarantee'] = np.random.choice([0,1], df1.shape[0])
df1['hotel_star'] = np.random.choice([0,1,2,3,4,5], df1.shape[0])
df1['order_status'] = np.random.choice([40,60,80], df1.shape[0])
df1['journey '] = np.random.choice([0,1,2], df1.shape[0])

1 个答案:

答案 0 :(得分:1)

我认为你需要:

  • melt重新塑造并按groupby size计算,unstack重新整理
  • 然后每行划分总和并加入MultiIndexindex
df = (df.melt('journey')
       .astype(str)
       .groupby(['variable', 'journey','value'])
       .size()
       .unstack(1, fill_value=0))

df = (df.div(df.sum(1), axis=0)
        .mul(100)
        .add_prefix('journey_')
        .set_index(df.index.map(' = '.join))
        .rename_axis(None, 1))

print (df)

                    journey_0  journey_1
hotel_star = 2     100.000000   0.000000
hotel_star = 3     100.000000   0.000000
hotel_star = 4      33.333333  66.666667
hotel_star = 5      80.000000  20.000000
is_guarantee = 0    66.666667  33.333333
is_guarantee = 1   100.000000   0.000000
order_status = 40   33.333333  66.666667
order_status = 60   88.888889  11.111111