Question

我正在尝试使用NLTK函数将文本数据转换为SKlearn的数字形式。我使用的数据基本上是短txt数据。

输入

NO 6 JALAN ASTAKA U8/82  SEKSYEN U8  BUKIT JELUTONG
MST GOLF PLAZA  NO 8  JALAN SS13/5

预期输出

no  jalan astaka u seksyen u  bukit jelutong
    mst golf plaza  no   jalan ss

我的代码

user_defined_stop_words = ['kwun','tong']
i = nltk.corpus.stopwords.words('english')
j = list(string.punctuation) + user_defined_stop_words
newstopwords = set(i).union(j)

def preprocess(x):
    x = re.sub('[^a-z\s]', '', x.lower())                  # get rid of noise
    x = [w for w in x.split() if w not in set(newstopwords)]  # remove stopwords
    return ' '.join(x)

data['Clean_addr'] = data['Adj_Addr'].apply(preprocess)

错误

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   2353             else:
   2354                 values = self.asobject
-> 2355                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2356 
   2357         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src\inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-55-3e3b1d8472ed> in preprocess(x)
      5 
      6 def preprocess(x):
----> 7     x = re.sub('[^a-z\s]', '', x.lower())                  # get rid of noise
      8     x = [w for w in x.split() if w not in set(newstopwords)]  # remove stopwords
      9     return ' '.join(x)

AttributeError: 'float' object has no attribute 'lower'

如何解决此问题。

使用NLTK时，获取'float'对象没有属性'lower'错误

0 个答案: