我正在处理带有推文的数据集,并且试图在推文中查找对其他用户的提及,这些推文可以不提及任何一个或多个用户。
这是DataFrame的头:
以下是我创建的用于提取一条推文中提及列表的功能:
def getMention(text):
mention = re.findall('(^|[^@\w])@(\w{1,15})', text)
if len(mention) > 0:
return [x[1] for x in mention]
else:
return None
我正在尝试在DataFrame中创建一个新列,并使用以下代码应用该函数:
df['mention'] = df['text'].apply(getMention)
运行此代码时,出现以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-43-426da09a8770> in <module>
----> 1 df['mention'] = df['text'].apply(getMention)
~/anaconda3_501/lib/python3.6/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
3192 else:
3193 values = self.astype(object).values
-> 3194 mapped = lib.map_infer(values, f, convert=convert_dtype)
3195
3196 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()
<ipython-input-42-d27373022afd> in getMention(text)
1 def getMention(text):
2
----> 3 mention = re.findall('(^|[^@\w])@(\w{1,15})', text)
4 if len(mention) > 0:
5 return [x[1] for x in mention]
~/anaconda3_501/lib/python3.6/re.py in findall(pattern, string, flags)
220
221 Empty matches are included in the result."""
--> 222 return _compile(pattern, flags).findall(string)
223
224 def finditer(pattern, string, flags=0):
TypeError: expected string or bytes-like object
答案 0 :(得分:1)
我无法发表评论(代表人数不足),因此,我建议您解决此错误。 似乎findall引发了一个异常,因为文本不是字符串,因此您可能需要使用以下命令检查文本的实际类型:
def getMention(text):
print(type(text))
mention = re.findall(r'(^|[^@\w])@(\w{1,15})', text)
if len(mention) > 0:
return [x[1] for x in mention]
else:
return None
(或者调试器,如果您知道怎么做的话)
如果文本可以转换为字符串,可以尝试此操作吗?
def getMention(text):
mention = re.findall(r'(^|[^@\w])@(\w{1,15})', str(text))
if len(mention) > 0:
return [x[1] for x in mention]
else:
return None
PS:不要忘了正则表达式前的r'...'
,以避免特殊字符被解释