将交叉点数据写入新CSV

时间:2016-09-22 10:19:56

标签: python csv

我有2个CSV文件,其中包含唯一字词列表。在我完成我的路口后,我得到了结果,但是当我尝试将它写入一个新文件时,它创建了一个非常大的文件,大约155MB,当它应该远低于2MB。

代码:

alist, blist = [], []

with open("SetA-unique.csv", "r") as fileA:
    reader = csv.reader(fileA, delimiter=',')
    for row in reader:
        alist += row

with open("SetB-unique.csv", "r") as fileB:
    reader = csv.reader(fileB, delimiter=',')
    for row in reader:
        blist += row

first_set = set(alist)
second_set = set(blist)

res = (first_set.intersection(second_set))

writer = csv.writer(open("SetA-SetB.csv", 'w'))

for row in res:
     writer.writerow(res)

2 个答案:

答案 0 :(得分:2)

您在每次迭代时整个集res写入文件。您可能希望改写行:

for row in res:
    writer.writerow([row])

答案 1 :(得分:0)

除了每次迭代编写整个集合之外,您也不需要创建多个集合和列表,您可以使用 itertools.chain

from itertools import chain
with open("SetA-unique.csv") as file_a, open("SetB-unique.csv")  as file_b,open("SetA-SetB.csv", 'w') as inter :
    r1 = csv.reader(file_a)
    r2 = csv.reader(file_b)
    for word in set(chain.from_iterable(r1)).intersection(chain.from_iterable(r2)):
        inter.write(word)+"\n"

如果你只是在写单词,也不需要使用 csv.writer ,只需使用上面的 file.write

如果你实际上是在尝试进行比较,那么你不应该创建一个可迭代的单词,你可以 imap 到元组:

from itertools import imap
with open("SetA-unique.csv") as file_a, open("SetB-unique.csv")  as file_b,open("SetA-SetB.csv", 'w') as inter :
    r1 = csv.reader(file_a)
    r2 = csv.reader(file_b)
    writer = csv.writer(inter)
    for row in set(imap(tuple, r1).intersection(imap(tuple, r2)):
        writer.writerow(row)

如果每行只有一个单词,则根本不需要csv lib。

from itertools import imap
with open("SetA-unique.csv") as file_a, open("SetB-unique.csv") as file_b,open("SetA-SetB.csv", 'w') as inter :
    for word in set(imap(str.strip, file_a)).intersection(imap(str.strip, file_b)):
        inter.write(word) + "\n"