循环遍历列表时缓慢的python脚本

时间:2016-03-09 16:52:58

标签: python list python-2.7 csv

我正在根据原始csv中的列中的值将csv拆分为两个csv。此代码有效,但需要大约一个小时才能在具有大约10000条记录的csv上运行。我已经尝试列举了列表,但我认为这不是加快这一点的正确方法。

我对这个编程非常缓慢和非常新,并且如果有人能够解释在哪里集中我的下一步努力以使其更快,我将不胜感激。我知道最少的行数是最好的,但我不明白在创建两个单独的csv时如何循环。循环甚至是问题吗?

myList = ['2','12','20','33'...]
with open(originalCSV, 'rb') as f:
   reader = csv.DictReader(f)
   rows = [row for row in reader if row['Column 10'] in myList]
for row in rows:
   with open(inmylistCSV, 'wb') as w:
       fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10']
       csvwriter = csv.DictWriter(w, fieldnames=fieldnames)
       csvwriter.writeheader()
       csvwriter.writerows(rows)

with open(originalCSV, 'rb') as f:
   reader = csv.DictReader(f)
   rows = [row for row in reader if row['Column 10'] not in myList]
for row in rows:
   with open(notinmylistCSV, 'wb') as w:
       fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10']
       csvwriter = csv.DictWriter(w, fieldnames=fieldnames)
       csvwriter.writeheader()
       csvwriter.writerows(rows)

2 个答案:

答案 0 :(得分:2)

这里的主要问题是你循环通过10,000条记录2x。所以基本上你做了20,000份工作记录(必要的两倍)

# This is what your doing

for x in range(10000):
    if is_odd(x):
       print('I am odd')

for x in range(10000):
    if is_even(x):
       print('I am even')

一个简单的解决方法就是将您的逻辑组合成表格

# This is what you should be doing

for x in range(10000):
    if is_odd(x):
       print('I am odd')
    else:
       print('I am even')

所以,总之,你现在应该做两件事

1)逻辑上合并以下几行

rows = [row for row in reader if row['Column 10'] in myList]
rows = [row for row in reader if row['Column 10'] not in myList]

2)优化写作部分

with open(notinmylistCSV | inmylistCSV, 'wb') as w:
   fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10']
   csvwriter = csv.DictWriter(w, fieldnames=fieldnames)
   csvwriter.writeheader()
   csvwriter.writerows(rows)

请注意,这是伪代码

答案 1 :(得分:0)

为什么不直接读取原始CSV并将行分发到其他CSV?

myList = ['2','12','20','33'...]

fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10']

in_list = open(inmylistCSV, 'wb')
in_list_csvwriter = csv.DictWriter(in_list, fieldnames=fieldnames)
in_list_csvwriter.writeheader()

not_in_list = with open(notinmylistCSV, 'wb')
not_in_list_csvwriter = csv.DictWriter(not_in_list, fieldnames=fieldnames)
not_in_list_csvwriter.writeheader()

with open(originalCSV, 'rb') as f:
   reader = csv.DictReader(f)
   for row in reader:
       if row['Column 10'] in myList:
           in_list_csvwriter.writerow(row)
       else:
           not_in_list_csvwriter.writerow(row)