Question

当我尝试合并名称列左侧的两个数据帧时，我收到标题错误。这是两个数据帧上的DataFrame.info（）调用。

Int64Index: 145 entries, 1 to 162 
Data columns (total 13 columns):
Quarter     145 non-null float64
Time        145 non-null object
Down        145 non-null float64
ToGo        145 non-null float64
Location    145 non-null object
Detail      145 non-null object
SEA         145 non-null float64
TAM         145 non-null float64
EPB         145 non-null float64
EPA         145 non-null float64
Win%        145 non-null float64
Name        145 non-null object
Yards       145 non-null int64
dtypes: float64(8), int64(1), object(4)
memory usage: 15.9+ KB
None

Int64Index: 1567 entries, 0 to 1566
Data columns (total 4 columns):
Name        1567 non-null object
Team        1567 non-null object
Position    1150 non-null object
Age         1567 non-null int64
dtypes: int64(1), object(3)
memory usage: 61.2+ KB
None

还有python堆栈的完整回溯。

Traceback (most recent call last):
  File "weighted_random_forest.py", line 478, in <module>
    main()
  File "weighted_random_forest.py", line 473, in main
    game = get_third_down_conversion_rate(team, game, files)
  File "weighted_random_forest.py", line 339, in get_third_down_conversion_rate
    df = df.merge(roster, on='Name', how='left')
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 5370, in merge
    copy=copy, indicator=indicator, validate=validate)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 58, in merge
    return op.get_result()
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 582, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 748, in _get_join_info
    right_indexer) = self._get_join_indexers()
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 727, in _get_join_indexers
    how=self.how)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 1050, in _get_join_indexers
    llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys)))
TypeError: type object argument after * must be an iterable, not itertools.imap

我设法让post mortem调试器启动以检查进入pandas中最后一次调用map的变量的参数。但是，我没有看到数据有任何问题。可能是因为这两个索引没有从0或1开始编号？他们都需要以相同偏移量开始的索引吗？

我也有点怀疑可能是地图被itertools.imap遮蔽，因此我排除了import itertools，而是使用了from itertools import tee。但是，这并没有解决问题。

我还尝试在合并之前从表中删除NaN值，但这仍然失败。我该怎么做才能尝试解决这个问题？

布雷登。

更新我检查了left_keys的值：它包含一个单元格值的列表（[]）元素。

更新

def name_matcher(Detail):
        res = re.findall('[A-Za-z\.\'\-]+ [A-Za-z\'\-]+', Detail)

        if len(res) != 0:
                return res[0]
        else:
                return []

你有它。

我做不到......

def name_matcher(Detail):
        res = re.findall('[A-Za-z\.\'\-]+ [A-Za-z\'\-]+', Detail)

        if len(res) != 0:
                return res[0]
        else:
                return ''

这就是修复。我有使用这些功能的申请。问题现在显而易见，因为空的名称将返回空列表而不是空字符串。

谢谢你们的帮助！

TypeError：*之后的类型对象参数必须是可迭代的，而不是itertools.map

0 个答案: