当我尝试合并名称列左侧的两个数据帧时,我收到标题错误。这是两个数据帧上的DataFrame.info()调用。
Int64Index: 145 entries, 1 to 162
Data columns (total 13 columns):
Quarter 145 non-null float64
Time 145 non-null object
Down 145 non-null float64
ToGo 145 non-null float64
Location 145 non-null object
Detail 145 non-null object
SEA 145 non-null float64
TAM 145 non-null float64
EPB 145 non-null float64
EPA 145 non-null float64
Win% 145 non-null float64
Name 145 non-null object
Yards 145 non-null int64
dtypes: float64(8), int64(1), object(4)
memory usage: 15.9+ KB
None
Int64Index: 1567 entries, 0 to 1566
Data columns (total 4 columns):
Name 1567 non-null object
Team 1567 non-null object
Position 1150 non-null object
Age 1567 non-null int64
dtypes: int64(1), object(3)
memory usage: 61.2+ KB
None
还有python堆栈的完整回溯。
Traceback (most recent call last):
File "weighted_random_forest.py", line 478, in <module>
main()
File "weighted_random_forest.py", line 473, in main
game = get_third_down_conversion_rate(team, game, files)
File "weighted_random_forest.py", line 339, in get_third_down_conversion_rate
df = df.merge(roster, on='Name', how='left')
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 5370, in merge
copy=copy, indicator=indicator, validate=validate)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 58, in merge
return op.get_result()
File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 582, in get_result
join_index, left_indexer, right_indexer = self._get_join_info()
File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 748, in _get_join_info
right_indexer) = self._get_join_indexers()
File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 727, in _get_join_indexers
how=self.how)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 1050, in _get_join_indexers
llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys)))
TypeError: type object argument after * must be an iterable, not itertools.imap
我设法让post mortem调试器启动以检查进入pandas中最后一次调用map的变量的参数。但是,我没有看到数据有任何问题。可能是因为这两个索引没有从0或1开始编号?他们都需要以相同偏移量开始的索引吗?
我也有点怀疑可能是地图被itertools.imap遮蔽,因此我排除了import itertools
,而是使用了from itertools import tee
。但是,这并没有解决问题。
我还尝试在合并之前从表中删除NaN值,但这仍然失败。我该怎么做才能尝试解决这个问题?
布雷登。
更新 我检查了left_keys的值:它包含一个单元格值的列表([])元素。
更新
def name_matcher(Detail):
res = re.findall('[A-Za-z\.\'\-]+ [A-Za-z\'\-]+', Detail)
if len(res) != 0:
return res[0]
else:
return []
你有它。
我做不到......
def name_matcher(Detail):
res = re.findall('[A-Za-z\.\'\-]+ [A-Za-z\'\-]+', Detail)
if len(res) != 0:
return res[0]
else:
return ''
这就是修复。我有使用这些功能的申请。问题现在显而易见,因为空的名称将返回空列表而不是空字符串。
谢谢你们的帮助!