从csv中提取数据

时间:2017-09-01 03:02:26

标签: python csv

我有一个csv文件,每行包含形容词列表。

例如,前两行如下:

["happy","sad","colorful"]
["horrible","sad","cheerful","happy"]

我想从这个文件中提取所有数据,以获得一个包含每个形容词的列表。 (这里的列表如下:

["happy","sad","colorful","horrible","cheerful"]

我是用Python做的。

import csv
with open('adj.csv', 'rb') as f: 
    reader = csv.reader(f) 
    adj_list = list(reader) 
    filtered_list = [] 
    for l in adj_list: 
        if l not in new_list: 
            filtered_list.append(l)

2 个答案:

答案 0 :(得分:1)

假设“记忆不重要”并且您正在寻找一个班轮:

from itertools import chain
from csv import reader

print(list(set(chain(*reader(open('file.csv'))))))

拥有'file.csv'这样的内容:

happy, sad, colorful
horrible, sad, cheerful, happy

<强>输出:

<'可怕','多彩','悲伤','开朗','快乐','快乐']

如果您不介意接收 Python集而不是列表,则可以删除list()部分。

答案 1 :(得分:0)

假设您只对订单无关紧要的唯一字词列表感兴趣

# Option A1
import csv


with open("adj.csv", "r") as f:
    seen = set()
    reader = csv.reader(f)
    for line in reader:
        for word in line:
            seen.add(word)
list(seen)
# ['cheerful', 'colorful', 'horrible', 'happy', 'sad']

更简洁:

# Option A2
with open("adj.csv", "r") as f:
    reader = csv.reader(f)
    unique_words = {word for line in reader for word in line}

list(unique_words)

with语句安全地打开和关闭文件。我们只是将每个单词添加到一个集合中。我们将过滤后的结果转换为list()并获取唯一(无序)单词列表。

<强>替代

如果命令无关紧要,请实施unique_everseen itertools recipe

来自itertools食谱:

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in it.filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

您可以手动实现此功能或安装为您实现该功能的第三个库,例如more_itertools,例如pip install more_itertools

# Option B
import csv

import more_itertools as mit


with open("adj.csv", "r") as f:
    reader = csv.reader(f)
    words = (word for line in reader for word in line)
    unique_words = list(mit.unique_everseen(words))

unique_words
# ['happy', 'sad', 'colorful', 'horrible', 'cheerful']