在字符串多列的熊猫中过滤数据

时间:2020-07-11 12:27:00

标签: python pandas dataframe

我有一个看起来像这样的数据框:

team_1  score_1 team_2  score_2
AUS     2       SCO     1
ENG     1       ARG     0
JPN     0       ENG     2

我可以使用以下方法检索单个团队的所有数据: #list指定感兴趣的团队

team = ['ENG']

#切片数据框以仅显示列“ Team 1”或“ Team 2”的值在指定的字符串列表“ team”中的行

df.loc[df['team_1'].isin(team) | df['team_2'].isin(team)]
team_1  score_1 team_2  score_2
ENG     1       ARG     0
JPN     0       ENG     2

我现在如何只返回“团队”的分数,例如:

team    score
ENG     1
ENG     2

也许为每个团队创建一个索引以进行过滤? 也许对team_1和team_2列进行编码以进行过滤?

2 个答案:

答案 0 :(得分:3)

new_df_1 = df[df.team_1 =='ENG'][['team_1', 'score_1']]
new_df_1 =new_df_1.rename(columns={"team_1":"team", "score_1":"score"})
#   team  score
#  0  ENG      1

new_df_2 = df[df.team_2 =='ENG'][['team_2', 'score_2']]
new_df_2 = new_df_2.rename(columns={"team_2":"team", "score_2":"score"})
#  team  score
# 1  ENG      2

然后连接两个数据框:

pd.concat([new_df_1, new_df_2])

输出为:

 team  score
0  ENG      1
1  ENG      2

答案 1 :(得分:0)

Melt列,filter表示团队中的值,计算分数列的总和,仅过滤团队和得分:

 team = ["ENG"]

(
    df
    .melt(cols, value_name="team")
    .query("team in @team")
    .assign(score=lambda x: x.filter(like="score").sum(axis=1))
    .loc[:, ["team", "score"]]
)

    team    score
1   ENG        1
5   ENG        2