Question

with open('text.txt','r') as f:
    for i in f:
        trantab = str.maketrans({key: None for key in string.punctuation})
        j = i.translate(trantab)
        result1.append(j)
shortword = re.compile(r'\W*\b\w{1,4}\b')
shortword.sub('', result1)
f = result1

，错误是：

  line 13, in shortword.sub('', result1)
TypeError: expected string or bytes-like object

我该如何解决？

Answer 1

由于尝试使用[] .sub（）数组而收到此错误...

我用此脚本解决了您的需求：

import re

t = []
t.append("THIS IS A SIMPLE DUMMY TEXT")
t.append("ANOTHER INDEX BLA BLA")

for i in t: 
    shortword = re.compile(r'\W*\b\w{1,4}\b')
    t = shortword.sub('', str(t))

print(t)

您只需要将shortword.sub（''，result1）分配给result1，并确保使用str（）：

result1 = shortword.sub('', str(result1))

我相信这会对您有帮助！

Answer 2

假设每个单词都在一行上，否则您将不得不用content来分解.split()

with open('something.txt') as f:
    content = [line.strip() for line in f]

res = list(filter(lambda x: len(x) >= 4, content))

删除少于4个字符的单词（python）

2 个答案: