序列项0:预期的str实例,找到列表

时间:2019-03-02 11:32:45

标签: python-3.x string nltk

这是我的代码的一部分。它从excel文件中读取。 我收到一个类型错误,提示“ TypeError:序列项0:预期的str实例,找到了列表”。

text=df.loc[page,["rev"]]
 def remove_punct(text):
  text=''.join([ch for ch in text if ch not in exclude])
  tokens = re.split('\W+', text),
  tex = " ".join([word for word in tokens if word not in cachedStopWords]),
  return tex

 s=df.loc[page,["rev"]].apply(lambda x:remove_punct(x))

这是错误。

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-4f3c29307e88> in <module>()
     26   return tokens
     27 
---> 28  s=df.loc[page,["rev"]].apply(lambda x:remove_punct(x))
     29 
     30  with open('FileName.csv', 'a', encoding="utf-8") as f:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   3190             else:
   3191                 values = self.astype(object).values
-> 3192                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3193 
   3194         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src\inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-16-4f3c29307e88> in <lambda>(x)
     26   return tokens
     27 
---> 28  s=df.loc[page,["rev"]].apply(lambda x:remove_punct(x))
     29 
     30  with open('FileName.csv', 'a', encoding="utf-8") as f:

<ipython-input-16-4f3c29307e88> in remove_punct(text)
     23   text=''.join([ch for ch in text if ch not in exclude])
     24   tokens = re.split('\W+', text),
---> 25   tex = " ".join([ch for ch in tokens if ch not in cachedStopWords]),
     26   return tokens
     27 

TypeError: sequence item 0: expected str instance, list found

1 个答案:

答案 0 :(得分:0)

我认为这两行结尾的逗号会创建您要处理的变量的列表。

  tokens = re.split('\W+', text), # <---- These commas at the end
  tex = " ".join([word for word in tokens if word not in cachedStopWords]), # <----

它的结果与您执行以下操作大致相同(为更好的示例进行编辑):

x = 12 * 24,
y = x * 10,
z = 40

print(f"X = {x}\n"
      f"Y = {y}\n"
      f"Z = {z}\n")

输出:

X = (288,)
Y = ((288, 288, 288, 288, 288, 288, 288, 288, 288, 288),)
Z = 40

逗号导致打包和解包变量。