我已阅读有关此错误的多篇帖子,但我仍然无法弄明白。当我尝试循环我的函数时:
def fix_Plan(location):
letters_only = re.sub("[^a-zA-Z]", # Search for all non-letters
" ", # Replace all non-letters with spaces
location) # Column and row to search
words = letters_only.lower().split()
stops = set(stopwords.words("english"))
meaningful_words = [w for w in words if not w in stops]
return (" ".join(meaningful_words))
col_Plan = fix_Plan(train["Plan"][0])
num_responses = train["Plan"].size
clean_Plan_responses = []
for i in range(0,num_responses):
clean_Plan_responses.append(fix_Plan(train["Plan"][i]))
这是错误:
Traceback (most recent call last):
File "C:/Users/xxxxx/PycharmProjects/tronc/tronc2.py", line 48, in <module>
clean_Plan_responses.append(fix_Plan(train["Plan"][i]))
File "C:/Users/xxxxx/PycharmProjects/tronc/tronc2.py", line 22, in fix_Plan
location) # Column and row to search
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36\lib\re.py", line 191, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
答案 0 :(得分:37)
正如你在评论中所说,一些值似乎是浮点数,而不是字符串。在将其传递给re.sub
之前,您需要将其更改为字符串。最简单的方法是在使用location
时将str(location)
更改为re.sub
。即使它已经是str
,也不会有任何影响。
letters_only = re.sub("[^a-zA-Z]", # Search for all non-letters
" ", # Replace all non-letters with spaces
str(location))
答案 1 :(得分:0)
我想更好的方法是使用re.match()函数。这是一个可能对您有帮助的示例。
import re
import nltk
from nltk.tokenize import word_tokenize
nltk.download('punkt')
sentences = word_tokenize("I love to learn NLP \n 'a :(")
#for i in range(len(sentences)):
sentences = [word.lower() for word in sentences if re.match('^[a-zA-Z]+', word)]
sentences
答案 2 :(得分:0)
最简单的解决方案是将python str函数应用于您要遍历的列。
如果您使用的是熊猫 可以实现为
dataframe ['column_name'] = dataframe ['column_name']。apply(str)
答案 3 :(得分:0)
我遇到了同样的问题。而且很有趣的是,每次我做某事,直到我意识到字符串中有两个特殊字符时,问题才解决。
例如,对我来说,文字有两个字符:
‎
(Left-to-Right Mark) 和 ‌
(Zero-width non-joiner)
我的解决办法是删除这两个字符,问题解决了。
import re
mystring = "‎Some Time W‌e"
mystring = re.sub(r"‎","",mystring)
mystring = re.sub(r"‌","",mystring)
我希望这能帮助像我这样有问题的人。