Question

python新手。喜欢预处理一些数据并将它们连接起来后，将一个字符串作为输入并且输出为预处理的单个字符串的函数

def message_to_words(message ):
        letters = re.sub("[^a-zA-Z]", " ", message ) 
        words = letters.lower().split()                             
        stops = set(stopwords.words("english"))                  
        meaningful_words = [w for w in words if not w in stops]  
        return( " ".join( meaningful_words ))

当我调用函数

时

clean_messages = []
for i in xrange(0, df["Message"].size):
        clean_messages.append( message_to_words( df["Message"][i] ) )

我收到此错误

TypeError                                 Traceback (most recent call last)
<ipython-input-156-061399cb4dfd> in <module>()
      3 for i in xrange(0, df["Message"].size):
----> 5         clean_messages.append( message_to_words( df["Message"][i] ) )

---> 12     letters = re.sub("[^a-zA-Z]", " ", message )
..../python2.7/re.pyc in sub(pattern, repl, string, count, flags)
    153     a callable, it's passed the match object and must return
    154     a replacement string to be used."""
--> 155     return _compile(pattern, flags).sub(repl, string, count)
    156 
    157 def subn(pattern, repl, string, count=0, flags=0):

TypeError: expected string or buffer

当数据行在500＆＃34;打印df [＆＃34;消息＆＃34;] [i]＆＃34;是一个字符串，代码没有错误，但是，当数据行增加到500以上时，打印df [＆＃34;消息＆＃34;] [i]＆＃34;是一个浮动。让我感到困惑

Python正则表达式错误：TypeError：期望的字符串或缓冲区

0 个答案: