我正在使用以下代码为Python实践导入these sample Kaggle data sets:
# importing everything
import pandas as pd
df_events = pd.DataFrame()
df_ginf = pd.DataFrame()
df_events = pd.read_csv('./events.csv')
df_ginf = pd.read_csv('./ginf.csv')
# creating a match table
eventsList = pd.Series(['On Target', 'Off Target', 'Blocked', 'Hit the Bar'])
eventListKey = pd.Series(['1', '2', '3', '4'])
eventsMatchTable = pd.concat([eventListKey, eventsList], axis = 1)
eventsMatchTable.columns = ['eventKey', 'eventName']
eventsMatchTable['eventKey'] = eventsMatchTable['eventKey'].astype(int)
# trimming the initial dataframe down to something more manageable
df_eventsPlayer = pd.DataFrame()
df_eventsPlayer = df_events[['player', 'event_team', 'opponent', 'shot_place', 'shot_outcome', 'is_goal']]
df_eventsPlayer = df_eventsPlayer.dropna()
df_eventsPlayer['shot_outcome'] =
df_eventsPlayer['shot_outcome'].astype(int)
# attempting the 'merge', here is where the error occurs
df_eventPlayerFinal = pd.DataFrame()
df_eventPlayerFinal = pd.merge(df_eventsPlayer, eventsMatchTable, how = 'left', on = ['shot_outcome','eventKey'])
df_eventPlayerFinal
错误大致说: KeyError:'shot_outcome' #验证合并密钥dtypes。我们可能需要强制
因为我要合并的列都是int
,所以这个错误对我来说没有意义。
我错过了什么?
答案 0 :(得分:2)
列有不同的名称,因此您无法使用on
。应指定哪个数据集包含给定列:
pd.merge(df_eventsPlayer, eventsMatchTable, how = 'left',
left_on = 'shot_outcome', right_on='eventKey')
当两个数据集包含指定的列
时,使用参数on
答案 1 :(得分:0)
前段时间我遇到了类似的问题,这是由一些NA值引起的,我的解决方案是调用DataFrame.fillna
。
答案 2 :(得分:0)
df_eventsPlayer['shot_outcome'] =
你必须在=