我有一个csv文件,每行包含形容词列表。
例如,前两行如下:
["happy","sad","colorful"]
["horrible","sad","cheerful","happy"]
我想从这个文件中提取所有数据,以获得一个包含每个形容词的列表。 (这里的列表如下:
["happy","sad","colorful","horrible","cheerful"]
我是用Python做的。
import csv
with open('adj.csv', 'rb') as f:
reader = csv.reader(f)
adj_list = list(reader)
filtered_list = []
for l in adj_list:
if l not in new_list:
filtered_list.append(l)
答案 0 :(得分:1)
假设“记忆不重要”并且您正在寻找一个班轮:
from itertools import chain
from csv import reader
print(list(set(chain(*reader(open('file.csv'))))))
拥有'file.csv'这样的内容:
happy, sad, colorful
horrible, sad, cheerful, happy
<强>输出:强>
<'可怕','多彩','悲伤','开朗','快乐','快乐']如果您不介意接收 Python集而不是列表,则可以删除list()
部分。
答案 1 :(得分:0)
假设您只对订单无关紧要的唯一字词列表感兴趣:
# Option A1
import csv
with open("adj.csv", "r") as f:
seen = set()
reader = csv.reader(f)
for line in reader:
for word in line:
seen.add(word)
list(seen)
# ['cheerful', 'colorful', 'horrible', 'happy', 'sad']
更简洁:
# Option A2
with open("adj.csv", "r") as f:
reader = csv.reader(f)
unique_words = {word for line in reader for word in line}
list(unique_words)
with
语句安全地打开和关闭文件。我们只是将每个单词添加到一个集合中。我们将过滤后的结果转换为list()
并获取唯一(无序)单词列表。
<强>替代强>
如果命令无关紧要,请实施unique_everseen
itertools recipe。
来自itertools食谱:
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in it.filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
您可以手动实现此功能或安装为您实现该功能的第三个库,例如more_itertools
,例如pip install more_itertools
# Option B
import csv
import more_itertools as mit
with open("adj.csv", "r") as f:
reader = csv.reader(f)
words = (word for line in reader for word in line)
unique_words = list(mit.unique_everseen(words))
unique_words
# ['happy', 'sad', 'colorful', 'horrible', 'cheerful']