这是我的代码的一部分。它从excel文件中读取。 我收到一个类型错误,提示“ TypeError:序列项0:预期的str实例,找到了列表”。
text=df.loc[page,["rev"]]
def remove_punct(text):
text=''.join([ch for ch in text if ch not in exclude])
tokens = re.split('\W+', text),
tex = " ".join([word for word in tokens if word not in cachedStopWords]),
return tex
s=df.loc[page,["rev"]].apply(lambda x:remove_punct(x))
这是错误。
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-4f3c29307e88> in <module>()
26 return tokens
27
---> 28 s=df.loc[page,["rev"]].apply(lambda x:remove_punct(x))
29
30 with open('FileName.csv', 'a', encoding="utf-8") as f:
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
3190 else:
3191 values = self.astype(object).values
-> 3192 mapped = lib.map_infer(values, f, convert=convert_dtype)
3193
3194 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/src\inference.pyx in pandas._libs.lib.map_infer()
<ipython-input-16-4f3c29307e88> in <lambda>(x)
26 return tokens
27
---> 28 s=df.loc[page,["rev"]].apply(lambda x:remove_punct(x))
29
30 with open('FileName.csv', 'a', encoding="utf-8") as f:
<ipython-input-16-4f3c29307e88> in remove_punct(text)
23 text=''.join([ch for ch in text if ch not in exclude])
24 tokens = re.split('\W+', text),
---> 25 tex = " ".join([ch for ch in tokens if ch not in cachedStopWords]),
26 return tokens
27
TypeError: sequence item 0: expected str instance, list found
答案 0 :(得分:0)
我认为这两行结尾的逗号会创建您要处理的变量的列表。
tokens = re.split('\W+', text), # <---- These commas at the end
tex = " ".join([word for word in tokens if word not in cachedStopWords]), # <----
它的结果与您执行以下操作大致相同(为更好的示例进行编辑):
x = 12 * 24,
y = x * 10,
z = 40
print(f"X = {x}\n"
f"Y = {y}\n"
f"Z = {z}\n")
输出:
X = (288,)
Y = ((288, 288, 288, 288, 288, 288, 288, 288, 288, 288),)
Z = 40
逗号导致打包和解包变量。