为什么这两个表没有加入Python?

时间:2018-01-01 21:43:39

标签: python python-3.x pandas

我正在使用以下代码为Python实践导入these sample Kaggle data sets

# importing everything 
import pandas as pd
df_events = pd.DataFrame()
df_ginf = pd.DataFrame()
df_events = pd.read_csv('./events.csv')
df_ginf = pd.read_csv('./ginf.csv')

# creating a match table
eventsList = pd.Series(['On Target', 'Off Target', 'Blocked', 'Hit the Bar'])
eventListKey = pd.Series(['1', '2', '3', '4'])
eventsMatchTable = pd.concat([eventListKey, eventsList], axis = 1)
eventsMatchTable.columns = ['eventKey', 'eventName']
eventsMatchTable['eventKey'] = eventsMatchTable['eventKey'].astype(int)

# trimming the initial dataframe down to something more manageable
df_eventsPlayer = pd.DataFrame()
df_eventsPlayer = df_events[['player', 'event_team', 'opponent', 'shot_place', 'shot_outcome', 'is_goal']]
df_eventsPlayer = df_eventsPlayer.dropna()
df_eventsPlayer['shot_outcome'] = 
df_eventsPlayer['shot_outcome'].astype(int)

# attempting the 'merge', here is where the error occurs
df_eventPlayerFinal = pd.DataFrame()
df_eventPlayerFinal = pd.merge(df_eventsPlayer, eventsMatchTable, how = 'left', on = ['shot_outcome','eventKey'])
df_eventPlayerFinal

错误大致说:     KeyError:'shot_outcome'     #验证合并密钥dtypes。我们可能需要强制

因为我要合并的列都是int,所以这个错误对我来说没有意义。

我错过了什么?

3 个答案:

答案 0 :(得分:2)

列有不同的名称,因此您无法使用on。应指定哪个数据集包含给定列:

pd.merge(df_eventsPlayer, eventsMatchTable, how = 'left',
         left_on = 'shot_outcome', right_on='eventKey')

两个数据集包含指定的列

时,使用参数on

答案 1 :(得分:0)

前段时间我遇到了类似的问题,这是由一些NA值引起的,我的解决方案是调用DataFrame.fillna

答案 2 :(得分:0)

df_eventsPlayer['shot_outcome'] = 你必须在=

之后放一些东西