我对如何在列表中存储目标词有疑问。
我有一个文本文件:
apple tree apple_tree
banana juice banana_juice
dinner time dinner_time
divorce lawyer divorce_lawyer
breakfast table breakfast_table
我想阅读此文件并仅存储名词...但是我在Python代码中苦苦挣扎。
file = open("text.txt","r")
for f in file.readlines():
words.append(f.split(" "))
我不知道如何用空格分隔线并消除带有“ _”的化合物...
list = [apple, tree, banana, juice, dinner, time...]
答案 0 :(得分:2)
尝试此代码。很好。
分割整个字符串并仅添加列表中没有复合词的值(即那些词没有_
)
代码:
temp = """apple tree apple_tree
banana juice banana_juice
dinner time dinner_time
divorce lawyer divorce_lawyer
breakfast table breakfast_table"""
new_arr = [i for i in temp.split() if not '_' in i]
print(new_arr)
输出:
['apple', 'tree', 'banana', 'juice', 'dinner', 'time', 'divorce', 'lawyer', 'breakfast', 'table']
答案 1 :(得分:0)
此代码仅存储没有下划线的单词,并且全部存储在一个列表中,而不是嵌套列表中。
words = []
file = open("text.txt","r")
for f in file.readlines():
words += [i for i in f.split(" ") if not '_' in i]
print(words)
答案 2 :(得分:0)
import re
file = ["apple tree apple_tree apple_tree_tree apple_tree_ _",
"banana juice banana_juice",
"dinner time dinner_time",
"divorce lawyer divorce_lawyer",
"breakfast table breakfast_table"]
#approach 1 - list comprehensions
words=[]
for f in file:
words += [x for x in f.split(" ") if '_' not in x]
print(words)
#approach 2 - regular expressions
words=[]
for f in file:
f = re.sub(r"\s*\w*_[\w_]*\s*", "", f)
words += f.split(" ")
print(words)
以上两种方法均适用。 IMO首先会更好(正则表达式可能会导致代价高昂),而Pythonic也会更好