查找组合最大值

时间:2019-10-21 13:52:03

标签: python pandas dataframe

我有以下DataFrame:

x=$(echo "40125512|abcd32External_SOC=ALPHA3;PCRFabcran" | sed 's/\([^|]*\).*/\1/')
echo "$x"
40125512

y=$(echo "40125512|abcd32External_SOC=ALPHA3;PCRFabcran" | sed 's/.*=\([^;]*\).*/\1/')
echo "$y"
ALPHA3

我需要为每个id查找最新的日期和小时,例如,对于id = 1,我想要2019-10-21和4,而我却获得了正确的日期,但是hour = 5

1 个答案:

答案 0 :(得分:1)

在所有3列中使用DataFrame.sort_values,并在id列中删除DataFrame.drop_duplicates的重复项:

L = [{'date': '2019-10-21', 'hour': 3, 'id': '1'},
{'date': '2019-10-21', 'hour': 4, 'id': '1'},
{'date': '2019-10-20', 'hour': 0, 'id': '1'},
{'date': '2019-10-20', 'hour': 1, 'id': '1'},
{'date': '2019-10-21', 'hour': 0, 'id': '1'},
{'date': '2019-10-20', 'hour': 0, 'id': '1'},
{'date': '2019-10-19', 'hour': 5, 'id': '1'},
{'date': '2019-10-20', 'hour': 0, 'id': '2'},
{'date': '2019-10-20', 'hour': 0, 'id': '3'}]

df = pd.DataFrame(L)
df['date'] = pd.to_datetime(df['date'])

df = df.sort_values(['id','date','hour'], ascending=[True, False, False]).drop_duplicates('id')
print (df)
        date  hour id
1 2019-10-21     4  1
7 2019-10-20     0  2
8 2019-10-20     0  3