根据重复的值熊猫创建对

时间:2020-03-28 16:21:26

标签: python-3.x pandas dataframe duplicates

我有一个Pandas DataFrame,其中包含多年以来一项运动的选手数据。注意:一个球员可以在同一赛季参加多个联赛。这是DataFrame的示例:

import pandas as pd
from io import StringIO
s = '''\
PlayerName,Year,League,Points
Player1,2010,LeagueA,10
Player1,2010,LeagueB,20
Player1,2011,LeagueC,30
'''
df = pd.read_csv(StringIO(s))

外观如下:

  PlayerName  Year   League  Points
0    Player1  2010  LeagueA      10
1    Player1  2010  LeagueB      20
2    Player1  2011  LeagueC      30

现在,我想创建一个新的DataFrame或重新格式化现有的DataFrame,以创建他们参加的联赛的成对比较。比较必须来自同一年或一年之内,并且不能有任何重复的配对。例如,我要结束的DataFrame看起来像这样:

Player Name      Year 1     League 1      Points 1     Year 2     League 2     Points 2
Player 1          2010      League A        10          2010      League B       20
Player 1          2010      League A        10          2011      League C       30
Player 1          2010      League B        20          2011      League C       30

我目前对此的想法是:

df = data
df1 = df.drop_duplicates(subset=['Player Name', 'Year'], keep='first')
df2 = df.drop_duplicates(subset=['Player Name', 'Year'], keep='last')

merged_df1 = df.merge(df1, on='Player Name')
merged_df2 = df.merge(df2, on='Player Name')

temp = [merged_df1, merged_df2]
combined_df = pd.concat(temp)
combined_df = combined_df.drop_duplicates(subset='Player Name', keep='first')
combined_df['Year Difference'] = combined_df['Year_x'] - combined_df['Year_y']
combined_df = combined_df.loc[(combined_df['Year Difference'] >= -1) & (combined_df['Year Difference'] <=1]

有更好的方法吗?我觉得这段代码相当庞大,并且会产生错误。任何帮助将不胜感激。

0 个答案:

没有答案
相关问题