根据其他列查找列值的前几类 DF:
nationality age card category amount
India Young AAA Garment 200
India Young AAA Dining 100
India Young BBB Garment 400
Aus Adult BBB Grocery 200
US Adult CCC Beverage 100
India Student CCC Beverage 50
India Adult AAA Grocery 1000
我想使用Amount
列对较高的值进行排序,如果类别,卡,国籍和年龄相同,则还应将金额相加并返回最高类别。
下面是我想要的输出的示例数据帧。
输出:
nationality age card Top1 category Top2 category Top3category
India young AAA Garment Dining NAN
India Adult AAA Grocery NAN NAN
India student CCC Beverage NAN NAN
Aus Adult BBB Grocery NAN NAN
US Adult CCC Beverage NAN NAN
对于印度,年轻人,AAA,服装,我的金额更高,并且成为了最高类别。其余的类似。
答案 0 :(得分:1)
df['sort_order'] = (df.sort_values(['nationality', 'age', 'card', 'amount'], ascending=False)
.groupby(['nationality', 'age', 'card'])
.cumcount())
df.set_index(['nationality', 'age', 'card', 'sort_order'])['category'].unstack().reset_index()
通过排序,然后使用cumcount
,您将获得按组的类别顺序(按数量)。然后df.unstack
以您想要的方式旋转表。当然,如果需要,您可以稍后重命名列。
输出:
#sort_order nationality age card 0 1
#0 Aus Adult BBB Grocery NaN
#1 India Adult AAA Grocery NaN
#2 India Student CCC Beverage NaN
#3 India Young AAA Garment Dining
#4 India Young BBB Garment NaN
#5 US Adult CCC Beverage NaN