比较日期并根据条件过滤组中的行

时间:2019-03-20 10:34:53

标签: python pandas

样本DF:

ID   Name        Price     Date             Fit_Test
1    Apple         10      2018-01-15          Super_Fit
2    Apple         10      2018-01-15          Super_Fit
3    Apple         10      2019-01-15          Super_Fit

4    Orange        12      2019-02-15          Not_Fit
5    Orange        12      2018-09-25          Not_Fit
6    Orange        12      NaT                 Not_Fit
7    Orange        12      2028-01-25          Not_Fit

8    Banana        15      2019-12-25          Medium_Fit
9    Banana        15      NaT                 Medium_Fit

10   Cherry         30     2021-06-23          Super_Fit

11   PineAp         30     2023-02-03          Medium_Fit
12   PineAp         30     2020-12-13          Medium_Fit

预期DF:

ID   Name        Price     Date             Fit_Test
1    Apple         10      2018-01-15          Super_Fit
2    Apple         10      2018-01-15          Super_Fit
3    Apple         10      2019-01-15          Super_Fit

7    Orange        12      2028-01-25          Not_Fit

8    Banana        15      2019-12-25          Medium_Fit
9    Banana        15      NaT                 Medium_Fit

10   Cherry         30     2021-06-23          Super_Fit

11   PineAp         30     2023-02-03          Medium_Fit

问题陈述:

我想用group-byName Price,然后基于Date并以Fit_Test作为条件列进行过滤。

  1. 如果Fit_Test为Super_Fit,则不需要no操作。 (行1,2,3和10在输入和期望DF中是相同的)

  2. 如果在NamePrice条件下且Fit_Test不是Super_Fit并且该组中没有NaT,则比较日期,以日期为准保留最高日期 (ID 11&12和“预期12”中的ID已删除)

  3. 如果在NamePrice条件内且Fit_Test不是Super_Fit,并且该组中有一个NaT

    3.1如果该组中的计数大于2,则比较日期,以最高日期为准 (ID 4,5,6,7和预期4 ,5,6已删除)

    3.2如果该组中的计数等于2,则保留两行**  (ID为8,9,预期的ID为8,9)**

1 个答案:

答案 0 :(得分:2)

使用:

df['Date'] = pd.to_datetime(df['Date'])

m1 = df['Fit_Test'].eq('Super_Fit').groupby([df['Name'],df['Price']]).transform('all')

m2 = df['Date'].notna().groupby([df['Name'],df['Price']]).transform('all')

m22 = df['Date'].eq(df.groupby(['Name', 'Price'])['Date'].transform('max'))

m3 = df.groupby(['Name', 'Price'])['Date'].transform('size').eq(2)

df = df[m1 | (m2 & m22) | (~m2 & m3) | (~m2 & m22)]
#it seems conditions should be simplify
#df = df[m1 | m22 | (~m2 & m3)]
print (df)
    ID    Name  Price       Date    Fit_Test
0    1   Apple     10 2018-01-15   Super_Fit
1    2   Apple     10 2018-01-15   Super_Fit
2    3   Apple     10 2019-01-15   Super_Fit
6    7  Orange     12 2028-01-25     Not_Fit
7    8  Banana     15 2019-12-25  Medium_Fit
8    9  Banana     15        NaT  Medium_Fit
9   10  Cherry     30 2021-06-23   Super_Fit
10  11  PineAp     30 2023-02-03  Medium_Fit