用熊猫替换和映射Python数据框中的字符串值

时间:2018-10-12 20:47:04

标签: python python-3.x pandas dataframe

嗨,我一直在尝试替换数据帧中的字符串值(字符串是NFL球队的缩写),我有这样的东西:

Index   IDMatch Usr1    Usr2    Usr3    Usr4    Usr5
0       1       Phi     Atl     Phi     Phi     Phi
1       2       Bal     Bal     Bal     Buf     Bal
2       3       Ind     Ind     Cin     Cin     Ind
3       4       NE      NE      Hou     NE      NE
4       5       Jax     Jax     NYG     NYG     NYG

和带有映射的数据框,如下所示:

Index  TEAM_YH  TeamID
0      ARI       1
1      ATL       2
2      BAL       3
...
31     WAS       32

我想用TeamID替换每个字符串以进行基本统计(频率),我已经尝试了下一个:

## Dataframe with strings and Team ID
dfDicTeams = dfTeams[['TEAM_YH','TeamID']].to_dict('dict')

## Dataframe with selections by users
dfW1.replace(dfDicTeams[['TEAM_YH']],dfDicTeams[['TeamID']]) ## Error: unhashable type: 'list'

dfW1.replace(dfDicTeams) ## Error: Replacement not allowed with overlapping keys and values

我在做什么错?可以吗?

我正在使用Python 3,我想要这样的东西:

Index   IDMatch Usr1    Usr2    Usr3    Usr4    Usr5
0       1       26      2       26      26      26
1       2       3       3       3       4       3
2       3       14      14      7       7       14
3       4       21      21      13      21      21
4       5       15      15      23      23      23

汇总选项:

IDMatch ATeam Count HTeam Count
1       26      4   2       1
2       3       4   4       1
3       14      3   7       2
4       21      4   13      1
5       15      2   23      3

1 个答案:

答案 0 :(得分:1)

给定一个主输入数据框df和一个映射数据框df_map,您可以创建一个序列映射,然后将pd.DataFrame.applymap与一个自定义函数一起使用:

s = df_map.set_index('TEAM_YH')['TeamID']
df.iloc[:, 2:] = df.iloc[:, 2:].applymap(lambda x: s.get(x.upper(), -1))

print(df)

   Index  IDMatch  Usr1  Usr2  Usr3  Usr4  Usr5
0      0        1     7     2     7     7     7
1      1        2     3     3     3     4     3
2      2        3     5     5    -1    -1     5
3      3        4    -1    -1    -1    -1    -1
4      4        5     6     6    -1    -1    -1

示例df_map用于计算上述结果:

Index  TEAM_YH  TeamID
0      ARI       1
1      ATL       2
2      BAL       3
3      BUF       4
4      IND       5
5      JAX       6
6      PHI       7
32     WAS       32