我的df看起来像这样:
index db id age score
1 1 1 1 2
2 1 1 2 1.5
3 1 2 2 3
4 1 2 3 4
5 2 1 2 3
6 2 1 1 1
7 2 2 3 2
8 2 2 5 3.5
9 3 1 4 4
...
我想获得每一对具有唯一性(db,id)对的最长寿命的行。 结果:
index db id age score
2 1 1 2 1.5
4 1 2 3 4
5 2 1 2 3
8 2 2 5 3.5
9 3 1 4 4
我使用了此功能,但是非常耗时:
def get_age_rel(main_df, age):
data = []
age_rel_df = main_df[main_df['age'] <= age]
for db_index in np.unique(age_rel_df['db']):
db_rel_df = age_rel_df[age_rel_df['db'] == db_index]
for some_id in np.unique(db_rel_df['id']):
data.append(max_rows(db_rel_df [db_rel_df ['id'] == some_id], 'age', 1))
return pd.concat(data,axis=1)
def max_rows(df, col, n):
max_indexes = df[col].nlargest(n)
max_indexes = list(max_indexes.index)
return df.loc[max_indexes]