with open('text.txt','r') as f:
for i in f:
trantab = str.maketrans({key: None for key in string.punctuation})
j = i.translate(trantab)
result1.append(j)
shortword = re.compile(r'\W*\b\w{1,4}\b')
shortword.sub('', result1)
f = result1
,错误是:
line 13, in shortword.sub('', result1)
TypeError: expected string or bytes-like object
我该如何解决?
答案 0 :(得分:0)
由于尝试使用[] .sub()数组而收到此错误...
我用此脚本解决了您的需求:
import re
t = []
t.append("THIS IS A SIMPLE DUMMY TEXT")
t.append("ANOTHER INDEX BLA BLA")
for i in t:
shortword = re.compile(r'\W*\b\w{1,4}\b')
t = shortword.sub('', str(t))
print(t)
您只需要将shortword.sub('',result1)分配给result1,并确保使用str():
result1 = shortword.sub('', str(result1))
我相信这会对您有帮助!
答案 1 :(得分:0)
假设每个单词都在一行上,否则您将不得不用content
来分解.split()
with open('something.txt') as f:
content = [line.strip() for line in f]
res = list(filter(lambda x: len(x) >= 4, content))