我收到以下代码的错误
def cleaning(CURRENT,STRING,NEXT):
data.ix[data[NEXT].str.contains(STRING,na=False),CURRENT] =...
data[NEXT][data[NEXT].str.contains(STRING,na=False)]
d = ['lower','Less']
c = a[5:]
for x,y in zip(range(len(c)),d):
cleaning(c[x],d,c[x+1])
cleaning(c[x],d,c[x+2])
这里,data是一个pandas DataFrame。 但是对于相同的功能,我在以下代码中没有错误
a = ['UBC','LBC', 'HC', 'FC', 'P:C/F','P', 'A', 'Sex']
b = ['upper','lower','hair','footwear']
for x,y in zip(range(len(a)),b):
cleaning(a[x],y,a[x+1])
cleaning(a[x],y,a[x+2])
我知道这是因为我们无法使用列表作为词典中的键,但我不确定这里是怎么发生的,为什么它在一个循环中起作用而不是其他
答案 0 :(得分:1)
您正在传递d
列表,作为STRING
参数:
d = ['lower','Less']
# ...
cleaning(c[x],d,c[x+1])
# ^
您的第二个示例有效,您传入的是y
,这是b
列表中的单个元素:
b = ['upper','lower','hair','footwear']
for x,y in zip(range(len(a)),b):
# ^ one element from b ^
cleaning(a[x],y,a[x+1])
# ^
pandas.Series.str.contains
方法默认接受正则表达式,re.compile
使用字典作为缓存来保存已编译的模式。因为您传入了一个列表,所以会收到错误:
>>> pandas.Series(['aa', 'bb', 'cc']).str.contains(['a'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/pandas/core/strings.py", line 1458, in contains
regex=regex)
File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/pandas/core/strings.py", line 222, in str_contains
regex = re.compile(pat, flags=flags)
File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/re.py", line 194, in compile
return _compile(pattern, flags)
File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/re.py", line 237, in _compile
p, loc = _cache[cachekey]
TypeError: unhashable type: 'list'
修复方法是传入y
而不是d
:
for x, y in zip(range(len(c)) ,d):
cleaning(c[x], y, c[x + 1])
cleaning(c[x], y, c[x + 2])
你可能想要提出更好的变量名称;单字母名称很难区分,很容易导致这些错误。