我有一个包含句子的文件。我想将这些句子提取到列表中并删除长度为< = 3
的单词这就是我现在所拥有的:
with open("./data/pos/train-pos.txt", "r", encoding="utf8") as f:
train_pos = [line.strip().lower() for line in f]
newDoc = [word for word in train_pos if len(word) >= 3]
print(newDoc)
train-pos = ['我喜欢苹果'苹果是我最喜欢的水果']
我想获得:['like apples', 'apples favorite fruits']
,但我获得了相同的列表。哪个是问题?我想以一种非常优化的方式做到这一点,因为train-pos.txt
包含数千个句子,所以如果你的解决方案与我的错误解决方案不同,那就没有问题了。
答案 0 :(得分:2)
您可以这样做:
>>> newDoc = [' '.join(word for word in sentence.split() if len(word) >= 3) for sentence in train_pos]
>>> newDoc
['like apples', 'apples are favorite fruits']