我正在根据原始csv中的列中的值将csv拆分为两个csv。此代码有效,但需要大约一个小时才能在具有大约10000条记录的csv上运行。我已经尝试列举了列表,但我认为这不是加快这一点的正确方法。
我对这个编程非常缓慢和非常新,并且如果有人能够解释在哪里集中我的下一步努力以使其更快,我将不胜感激。我知道最少的行数是最好的,但我不明白在创建两个单独的csv时如何循环。循环甚至是问题吗?
myList = ['2','12','20','33'...]
with open(originalCSV, 'rb') as f:
reader = csv.DictReader(f)
rows = [row for row in reader if row['Column 10'] in myList]
for row in rows:
with open(inmylistCSV, 'wb') as w:
fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10']
csvwriter = csv.DictWriter(w, fieldnames=fieldnames)
csvwriter.writeheader()
csvwriter.writerows(rows)
with open(originalCSV, 'rb') as f:
reader = csv.DictReader(f)
rows = [row for row in reader if row['Column 10'] not in myList]
for row in rows:
with open(notinmylistCSV, 'wb') as w:
fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10']
csvwriter = csv.DictWriter(w, fieldnames=fieldnames)
csvwriter.writeheader()
csvwriter.writerows(rows)
答案 0 :(得分:2)
这里的主要问题是你循环通过10,000条记录2x。所以基本上你做了20,000份工作记录(必要的两倍)
# This is what your doing
for x in range(10000):
if is_odd(x):
print('I am odd')
for x in range(10000):
if is_even(x):
print('I am even')
一个简单的解决方法就是将您的逻辑组合成表格
# This is what you should be doing
for x in range(10000):
if is_odd(x):
print('I am odd')
else:
print('I am even')
所以,总之,你现在应该做两件事
1)逻辑上合并以下几行
rows = [row for row in reader if row['Column 10'] in myList]
rows = [row for row in reader if row['Column 10'] not in myList]
2)优化写作部分
with open(notinmylistCSV | inmylistCSV, 'wb') as w:
fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10']
csvwriter = csv.DictWriter(w, fieldnames=fieldnames)
csvwriter.writeheader()
csvwriter.writerows(rows)
请注意,这是伪代码
答案 1 :(得分:0)
为什么不直接读取原始CSV并将行分发到其他CSV?
myList = ['2','12','20','33'...]
fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10']
in_list = open(inmylistCSV, 'wb')
in_list_csvwriter = csv.DictWriter(in_list, fieldnames=fieldnames)
in_list_csvwriter.writeheader()
not_in_list = with open(notinmylistCSV, 'wb')
not_in_list_csvwriter = csv.DictWriter(not_in_list, fieldnames=fieldnames)
not_in_list_csvwriter.writeheader()
with open(originalCSV, 'rb') as f:
reader = csv.DictReader(f)
for row in reader:
if row['Column 10'] in myList:
in_list_csvwriter.writerow(row)
else:
not_in_list_csvwriter.writerow(row)