我想在列I
和II
相同时获取最新列。
所以结果应该是。 1/30/2017
击败1/27/2017
。
I I III IV
A X 1/30/2017 9:33:00 AM some_data
A Y 1/30/2017 9:33:00 AM some_data
A Z 1/30/2017 9:33:00 AM some_data
A X 1/27/2017 4:53:00 PM some_data
A Y 1/27/2017 4:53:00 PM some_data
A Z 1/27/2017 4:53:00 PM some_data
B X 1/30/2017 9:33:00 AM some_data
B Y 1/30/2017 9:33:00 AM some_data
B Z 1/30/2017 9:33:00 AM some_data
B X 1/27/2017 4:53:00 PM some_data
B Y 1/27/2017 4:53:00 PM some_data
B Z 1/27/2017 4:53:00 PM some_data
这是我想要的结果。
I I III IV
A X 1/30/2017 9:33:00 AM some_data
A Y 1/30/2017 9:33:00 AM some_data
A Z 1/30/2017 9:33:00 AM some_data
B X 1/30/2017 9:33:00 AM some_data
B Y 1/30/2017 9:33:00 AM some_data
B Z 1/30/2017 9:33:00 AM some_data
有人可以帮我弄清楚如何提取这些行吗?
答案 0 :(得分:1)
看起来你想要的是groupby()
,transform()
和max()
:
<强>代码:强>
data = [
('I', 'II', 'III', 'IV'),
('A', 'X', '1/30/2017 9:33:00 AM', 'some_data'),
('A', 'Y', '1/30/2017 9:33:00 AM', 'some_data'),
('A', 'Z', '1/30/2017 9:33:00 AM', 'some_data'),
('A', 'X', '1/27/2017 4:53:00 PM', 'some_data'),
('A', 'Y', '1/27/2017 4:53:00 PM', 'some_data'),
('A', 'Z', '1/27/2017 4:53:00 PM', 'some_data'),
('B', 'X', '1/30/2017 9:33:00 AM', 'some_data'),
('B', 'Y', '1/30/2017 9:33:00 AM', 'some_data'),
('B', 'Z', '1/30/2017 9:33:00 AM', 'some_data'),
('B', 'X', '1/27/2017 4:53:00 PM', 'some_data'),
('B', 'Y', '1/27/2017 4:53:00 PM', 'some_data'),
('B', 'Z', '1/27/2017 4:53:00 PM', 'some_data'),
]
import pandas as pd
df = pd.DataFrame(data[1:], columns=data[0])
df['III'] = pd.to_datetime(df['III'])
# groupby first two columns, then get the maximum value in the third column
idx = df.groupby(['I', 'II'])['III'].transform(max) == df['III']
# use the index to fetch correct rows in dataframe
print(df[idx])
<强>结果:强>
I II III IV
0 A X 2017-01-30 09:33:00 some_data
1 A Y 2017-01-30 09:33:00 some_data
2 A Z 2017-01-30 09:33:00 some_data
6 B X 2017-01-30 09:33:00 some_data
7 B Y 2017-01-30 09:33:00 some_data
8 B Z 2017-01-30 09:33:00 some_data