熊猫分组依据排名保持行

时间:2020-08-12 13:15:03

标签: python pandas dataframe pandas-groupby

我有这个dataframe

    date        value       source
0   2020-02-14  0.438767    L8-SR
1   2020-02-15  0.422867    S2A-SR
2   2020-03-01  0.657453    L8-SR
3   2020-03-01  0.603989    S2B-SR
4   2020-03-11  0.717264    S2B-SR
5   2020-04-02  0.737118    L8-SR

我想在groupby列旁边date,在其中我根据从source列中选择的排名/重要性来保留行。例如,我的排名是L8-SR> S2B-SR> GP6_r,这意味着对于所有具有相同日期的行,请将行保留在source==L8-SR处,如果其中没有包含L8-SR,则将行保留在{{ 1}}等。如何在source==S2B-SR

中完成此操作

输出应如下所示:

pandas groupby

2 个答案:

答案 0 :(得分:1)

让我们尝试category dtype和drop_duplicates

orders = ['L8-SR','S2B-SR','GP6_r']

df.source = df.source.astype('category')

df.source.cat.set_categories(orders, ordered=True)

df.sort_values(['date','source']).drop_duplicates(['date'])

输出:

         date     value  source
0  2020-02-14  0.438767   L8-SR
1  2020-02-15  0.422867  S2A-SR
2  2020-03-01  0.657453   L8-SR
4  2020-03-11  0.717264  S2B-SR
5  2020-04-02  0.737118   L8-SR

答案 1 :(得分:0)

请按以下代码尝试按操作分组。要在此操作后订购,您可以执行排序方式

# Import pandas library
import pandas as pd

# Declare a data dictionary contains the data mention in table
pandasdata_dict = {'date':['2020-02-14', '2020-02-15', '2020-03-01', '2020-03-01', '2020-03-11', '2020-04-02'],  
        'value':[0.438767, 0.422867, 0.657453, 0.603989, 0.717264, 0.737118],  
        'source':['L8-SR', 'S2A-SR', 'L8-SR', 'S2B-SR', 'S2B-SR', 'L8-SR']}  

# Convert above dictionary data to the data frame
df = pd.DataFrame(pandasdata_dict)

# display data frame
df

# Convert date field to datetime 
df["date"] = pd.to_datetime(df["date"])

# Once conversion done then do the group by operation on the data frame with date field
df.groupby([df['date'].dt.date])