Question

我有以下熊猫数据框：

id  val city    
4   78  a   
4   12  b   
4   50  c   

9   20  d   
9   8   e   
9   30  f   
9   17  g

我想将其转换为以下形状。在每个“ id”组中，基于“ val”获取最大的行（在这种情况下，n = 2）。例如ID为4的组中的78和50，ID为9的组中的30和20

id  val city    
4   78  a   
4   50  c   

9   30  f   
9   20  d

最后，如下所示旋转数据表：

id  c_1stLrgst  c_1Lrgst_val    c_2ndLrgst  c_2Lrgst_val...c_nLrgst c_nLrgst_val
4   a           78              c           50
9   f           30              d           20

我可以使用df.groupby('id').nlargest(2, 'val')来获得组。不知道下一步该怎么做。

import pandas as pd
df_dict = {'id': [4,4,4,9,9,9,9],
            'val':[78,12,50,20,8,30,17],    
            'city':['a', 'b', 'c', 'd', 'e', 'f', 'g'], 
            };
df = pd.DataFrame(df_dict);

Answer 1

您可以使用sort_values + groupby.head，然后使用另一个groupby至list。然后拆分列表并进行串联。

# sort by "val" descending and extract first 2 rows from each group
df_filtered = df.sort_values('val', ascending=False)\
                .groupby('id').head(2)

groupvars = ['city', 'val']

# groupby city and val
g = df_filtered.groupby('id')[groupvars].agg(list)

# split lists and create dataframe for each group key
L = [pd.DataFrame(g[x].values.tolist(), index=res.index).add_prefix(x) for x in groupvars]

# concatenate results
res = pd.concat(L, axis=1)

print(res)

   city0 city1  val0  val1
id                        
4      a     c    78    50
9      f     d    30    20

根据列中的一组查找最大价值的行并在熊猫中进行旋转

1 个答案: