如何按特定列的最大日期获取行?

时间:2017-03-15 17:45:45

标签: python pandas

我想在列III相同时获取最新列。 所以结果应该是。 1/30/2017击败1/27/2017

I I III                    IV
A X 1/30/2017  9:33:00 AM  some_data
A Y 1/30/2017  9:33:00 AM  some_data
A Z 1/30/2017  9:33:00 AM  some_data
A X 1/27/2017  4:53:00 PM  some_data
A Y 1/27/2017  4:53:00 PM  some_data
A Z 1/27/2017  4:53:00 PM  some_data
B X 1/30/2017  9:33:00 AM  some_data
B Y 1/30/2017  9:33:00 AM  some_data
B Z 1/30/2017  9:33:00 AM  some_data
B X 1/27/2017  4:53:00 PM  some_data
B Y 1/27/2017  4:53:00 PM  some_data
B Z 1/27/2017  4:53:00 PM  some_data

这是我想要的结果。

I I III                    IV
A X 1/30/2017  9:33:00 AM  some_data
A Y 1/30/2017  9:33:00 AM  some_data
A Z 1/30/2017  9:33:00 AM  some_data
B X 1/30/2017  9:33:00 AM  some_data
B Y 1/30/2017  9:33:00 AM  some_data
B Z 1/30/2017  9:33:00 AM  some_data

有人可以帮我弄清楚如何提取这些行吗?

1 个答案:

答案 0 :(得分:1)

看起来你想要的是groupby()transform()max()

<强>代码:

data = [
    ('I', 'II', 'III', 'IV'),
    ('A', 'X', '1/30/2017 9:33:00 AM', 'some_data'),
    ('A', 'Y', '1/30/2017 9:33:00 AM', 'some_data'),
    ('A', 'Z', '1/30/2017 9:33:00 AM', 'some_data'),
    ('A', 'X', '1/27/2017 4:53:00 PM', 'some_data'),
    ('A', 'Y', '1/27/2017 4:53:00 PM', 'some_data'),
    ('A', 'Z', '1/27/2017 4:53:00 PM', 'some_data'),
    ('B', 'X', '1/30/2017 9:33:00 AM', 'some_data'),
    ('B', 'Y', '1/30/2017 9:33:00 AM', 'some_data'),
    ('B', 'Z', '1/30/2017 9:33:00 AM', 'some_data'),
    ('B', 'X', '1/27/2017 4:53:00 PM', 'some_data'),
    ('B', 'Y', '1/27/2017 4:53:00 PM', 'some_data'),
    ('B', 'Z', '1/27/2017 4:53:00 PM', 'some_data'),
]

import pandas as pd
df = pd.DataFrame(data[1:], columns=data[0])
df['III'] = pd.to_datetime(df['III'])

# groupby first two columns, then get the maximum value in the third column
idx = df.groupby(['I', 'II'])['III'].transform(max) == df['III']

# use the index to fetch correct rows in dataframe
print(df[idx])

<强>结果:

   I II                 III         IV
0  A  X 2017-01-30 09:33:00  some_data
1  A  Y 2017-01-30 09:33:00  some_data
2  A  Z 2017-01-30 09:33:00  some_data
6  B  X 2017-01-30 09:33:00  some_data
7  B  Y 2017-01-30 09:33:00  some_data
8  B  Z 2017-01-30 09:33:00  some_data