关于此问题,我在stackoverflow中浏览了多个消息源,但无法解决。 因此,我已将其张贴在这里,请解决。
# Combining all the above statemennts
from tqdm import tqdm
Other_skill = []
# tqdm is for printing the status bar
for sentance in tqdm(project_data['Other skills'].values):
sent = decontracted(sentance)
sent = sent.replace('\\r', ' ')
sent = sent.replace('\\"', ' ')
sent = sent.replace('\\n', ' ')
sent = re.sub('[^A-Za-z0-9]+', ' ', sent)
sent = ' '.join(e for e in sent.split() if e not in stopwords)
Other_skill.append(sent.lower().strip())
错误:
TypeError Traceback (most recent call last)
<ipython-input-12-30687b6f17e1> in <module>()
4 # tqdm is for printing the status bar
5 for sentance in tqdm(project_data['Other skills'].values):
----> 6 sent = decontracted(sentance)
7 sent = sent.replace('\\r', ' ')
8 sent = sent.replace('\\"', ' ')
<ipython-input-7-a344e4b38b78> in decontracted(phrase)
4 def decontracted(phrase):
5 # specific
----> 6 phrase = re.sub(r"won't", "will not", phrase)
7 phrase = re.sub(r"can\'t", "can not", phrase)
8
C:\ProgramData\Anaconda3\lib\re.py in sub(pattern, repl, string, count, flags)
189 a callable, it's passed the match object and must return
190 a replacement string to be used."""
--> 191 return _compile(pattern, flags).sub(repl, string, count)
192
193 def subn(pattern, repl, string, count=0, flags=0):
TypeError: expected string or bytes-like object
答案 0 :(得分:0)
我认为values()中的括号是必需的。如果我是正确的,project_data是一本字典,而您错过了值中的括号
from tqdm import tqdm
Other_skill = []
# tqdm is for printing the status bar
# the values must have round brackets
for sentance in tqdm(project_data['Other skills'].values()):
sent = decontracted(sentance)
sent = sent.replace('\\r', ' ')
sent = sent.replace('\\"', ' ')
sent = sent.replace('\\n', ' ')
sent = re.sub('[^A-Za-z0-9]+', ' ', sent)
sent = ' '.join(e for e in sent.split() if e not in stopwords)
Other_skill.append(sent.lower().strip())
答案 1 :(得分:0)
查看堆栈跟踪,我们可以发现sub
中的decontracted
输入错误。
最好是在箭头用来检查短语值的地方放置一个断点:
4 def decontracted(phrase):
5 # specific
----> 6 phrase = re.sub(r"won't", "will not", phrase)
如果您不知道该怎么做,则可以添加一些调试代码,如下所示:
def decontracted(phrase):
# specific
print(f'phrase: {phrase}\ttype: {type(phrase)}')
phrase = re.sub(r"won't", "will not", phrase)
[...]
这将打印给 decontracted 的每个短语及其类型。只能是bytes
或str
。如果没有,那么您将找到错误并可以相应地进行纠正。
希望对您有帮助,我们没有足够的信息来帮助您。