如何比较Python数据框中的子字符串以创建新列?

时间:2019-12-09 12:53:20

标签: python pandas dataframe

我目前有一个用于分析体育数据的数据框。一栏“ Team”具有玩家所属的团队,另一栏“ Game Info”具有有关游戏的信息。游戏信息列如下所示

SAC @ HOU 12/09/2019 08:00 PM ET

,并且“团队”列中可以包含“ SAC”或“ HOU”。我正在尝试创建一个包含对手的新列。目前我尝试过的是

df.insert(7, "Opp", '', True)
df["Opp"][df['Game Info'].str[:3].str.contains(df['Team'])] = df['Game Info'].str[4:7]
df["Opp"][df['Opp'].empty] = df['Team']

这给了我以下错误:

'Series' objects are mutable, thus they cannot be hashed

我也尝试过

df['Opp'] = np.where(df['Team'].str != df['Game Info'].str[:3]), df['Game Info'].str[:3], df['Game Info'].str[4:7])

df['Opp'] = df['Game Info'].str[:3] if df['Team'].str != df['Game Info'].str[:3] else df['Game Info'].str[4:7]

但都给我以下错误:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

我如何能够正确比较这些子字符串?

1 个答案:

答案 0 :(得分:1)

使用:

df=pd.DataFrame({'Team':['SAC','HOU'], 'Game Info':['SAC@HOU 12/09/2019 08:00PM ET', 'SAC@HOU 12/09/2019 08:00PM ET']})    
df['Opp'] = np.where(df['Team'] == df['Game Info'].str[:3], df['Game Info'].str[4:7], df['Game Info'].str[:3])
df
  Team                      Game Info  Opp
0  SAC  SAC@HOU 12/09/2019 08:00PM ET  HOU
1  HOU  SAC@HOU 12/09/2019 08:00PM ET  SAC