我有2个CSV文件,其中包含唯一字词列表。在我完成我的路口后,我得到了结果,但是当我尝试将它写入一个新文件时,它创建了一个非常大的文件,大约155MB,当它应该远低于2MB。
代码:
alist, blist = [], []
with open("SetA-unique.csv", "r") as fileA:
reader = csv.reader(fileA, delimiter=',')
for row in reader:
alist += row
with open("SetB-unique.csv", "r") as fileB:
reader = csv.reader(fileB, delimiter=',')
for row in reader:
blist += row
first_set = set(alist)
second_set = set(blist)
res = (first_set.intersection(second_set))
writer = csv.writer(open("SetA-SetB.csv", 'w'))
for row in res:
writer.writerow(res)
答案 0 :(得分:2)
您在每次迭代时将整个集res
写入文件。您可能希望改写行:
for row in res:
writer.writerow([row])
答案 1 :(得分:0)
除了每次迭代编写整个集合之外,您也不需要创建多个集合和列表,您可以使用 itertools.chain :
from itertools import chain
with open("SetA-unique.csv") as file_a, open("SetB-unique.csv") as file_b,open("SetA-SetB.csv", 'w') as inter :
r1 = csv.reader(file_a)
r2 = csv.reader(file_b)
for word in set(chain.from_iterable(r1)).intersection(chain.from_iterable(r2)):
inter.write(word)+"\n"
如果你只是在写单词,也不需要使用 csv.writer ,只需使用上面的 file.write 。
如果你实际上是在尝试进行比较,那么你不应该创建一个可迭代的单词,你可以 imap 到元组:
from itertools import imap
with open("SetA-unique.csv") as file_a, open("SetB-unique.csv") as file_b,open("SetA-SetB.csv", 'w') as inter :
r1 = csv.reader(file_a)
r2 = csv.reader(file_b)
writer = csv.writer(inter)
for row in set(imap(tuple, r1).intersection(imap(tuple, r2)):
writer.writerow(row)
如果每行只有一个单词,则根本不需要csv lib。
from itertools import imap
with open("SetA-unique.csv") as file_a, open("SetB-unique.csv") as file_b,open("SetA-SetB.csv", 'w') as inter :
for word in set(imap(str.strip, file_a)).intersection(imap(str.strip, file_b)):
inter.write(word) + "\n"