Python改进列表比较速度

时间:2014-06-01 09:58:00

标签: python

我需要帮助提高脚本速度,首先它工作正常,但脚本运行得越慢,它就越慢,我总是需要重新启动才能获得全速。所以我真的需要找到一种加快速度的方法。

脚本的工作原理:

  1. 打开已保存的.txt文件skus_local(~100-400k行)+ keywords_local(约200万行+行)
  2. 它获取网址,类别(~10k行文件)和每个网址的循环,类别步骤3,5,6,以便它重复进行。
  3. 该脚本将抓取2个列表new_skus(400个值)+ new_keywords(1k最大值)
  4. 该脚本将使用new_skus和使用唯一值创建的新old_skus检查upload_skus
  5. new_keywords + old_keywords
  6. 相同
  7. 脚本将附加到文件upload_skus and upload_keywords`
  8. 我可以看到比较时的步骤4,5,(6可能)导致速度问题

            try:
                f = open(settings['skus_local'],"r")
                old_skus=f.read().split("\n")[:-1]
                f.close()
                del f
            except:
                old_skus=[]
                f = open(settings['skus_local'],"w")
                f.close()
                del f
            skus_local_file = open(settings['skus_local'],"a")
    
            try:
                f = open(settings['keywords_local'], "r")
                old_keywords=f.read().split("\n")[:-1]
                f.close()
                del f
            except:
                old_keywords=[]
                f = open(settings['keywords_local'], "w")
                f.close()
                del f
            keywords_local_file = open(settings['keywords_local'],"a")
    
    
            csv_reader_counter = 0
            for category, url in csv.reader(fp):
                if not (csv_reader_counter == fp_counter):
                  csv_reader_counter = csv_reader_counter + 1
                  continue
    
                print url,category
    
                new_skus, new_keywords = ScraperJP.main(url)
    
                upload_skus=[]
    
                for sku in new_skus:
                    if sku not in old_skus:
                        upload_skus.append(sku)
    
                del new_skus
    
                if upload_skus!=[]:
                    insert_products.main(settings['admin_url'],settings['username'],settings['password'],upload_skus,category)
                    for sku in upload_skus:
                        skus_local_file.write(sku+"\n")
                        old_skus.append(sku)
                    skus_local_file.flush()
                    del upload_skus
    
                upload_keywords=[]
    
                for urls in new_keywords:
                    if urls not in old_keywords:
                        upload_keywords.append(urls)
                del new_keywords
    
                if upload_keywords!=[]:
                    for keyword in upload_keywords:
                        keywords_local_file.write(keyword+"\n")
                        old_keywords.append(keyword)
                    keywords_local_file.flush()
                del upload_keywords
    
                csv_reader_counter = csv_reader_counter + 1
                fp_counter = fp_counter + 1
                fl = open('lineno.txt',"w")
                fl.write(str(fp_counter))
                fl.close()
                gc.collect()
    
            os.remove('lineno.txt')
            skus_local_file.close()
            keywords_local_file.close()
            fp.close()
            del skus_local_file
            del keywords_local_file
            del fp
    if __name__=='__main__':
        main()
    

1 个答案:

答案 0 :(得分:1)

将信息存储在sets中。

要检查新内容,您只需new_skus - old_skus

所以不是像以下那样的行:

for sku in new_skus:
    if sku not in old_skus:
       upload_skus.append(sku)

您可以new_skus.difference(old_skus)使用new_skus中的元素,而不是old_skus中的元素。

如果您想存储该套装,可以使用泡菜。

import pickle

s = {1,2,3,4}
with open("s.pick","wb") as f: # pickle it to file
    pickle.dump(s,f)

with open("s.pick","rb") as f1:
    un_p = pickle.loads(f1.read()) # unpickle and use

print un_p

set([1, 2, 3, 4])

您还可以将对象附加到一个文件:

s2 = {4,5,6,7}

import pickle

with open("s.pick","ab") as f:
    pickle.dump(s2,f)


with open("s.pick","rb") as f1:
    s1 = pickle.load(f1)
    s2 = pickle.load(f1)
    print s1,s2
set([1, 2, 3, 4]) set([4, 5, 6, 7])

使用集合的示例:

s1={1, 2, 3, 4}
s2={4, 5, 6, 7}
s3={8,9,10,11}
print s1.difference(s2)
print s1.union(s2,s3)
set([1, 2, 3]) # in set 1 bit not in set 2
set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) # all elements in s1,s2 and s3

您可以使用以下方法将一组内容添加到另一组:

s1.update(s2) #  add contents of s2 to s1
print "updated s1 with contents of s2", s1
updated s1 with contents of s2 set([1, 2, 3, 4, 5, 6, 7])