我需要帮助提高脚本速度,首先它工作正常,但脚本运行得越慢,它就越慢,我总是需要重新启动才能获得全速。所以我真的需要找到一种加快速度的方法。
脚本的工作原理:
skus_local
(~100-400k行)+ keywords_local
(约200万行+行)new_skus
(400个值)+ new_keywords
(1k最大值)new_skus
和使用唯一值创建的新old_skus
检查upload_skus
。new_keywords + old_keywords
upload_skus and
upload_keywords` 我可以看到比较时的步骤4,5,(6可能)导致速度问题
try:
f = open(settings['skus_local'],"r")
old_skus=f.read().split("\n")[:-1]
f.close()
del f
except:
old_skus=[]
f = open(settings['skus_local'],"w")
f.close()
del f
skus_local_file = open(settings['skus_local'],"a")
try:
f = open(settings['keywords_local'], "r")
old_keywords=f.read().split("\n")[:-1]
f.close()
del f
except:
old_keywords=[]
f = open(settings['keywords_local'], "w")
f.close()
del f
keywords_local_file = open(settings['keywords_local'],"a")
csv_reader_counter = 0
for category, url in csv.reader(fp):
if not (csv_reader_counter == fp_counter):
csv_reader_counter = csv_reader_counter + 1
continue
print url,category
new_skus, new_keywords = ScraperJP.main(url)
upload_skus=[]
for sku in new_skus:
if sku not in old_skus:
upload_skus.append(sku)
del new_skus
if upload_skus!=[]:
insert_products.main(settings['admin_url'],settings['username'],settings['password'],upload_skus,category)
for sku in upload_skus:
skus_local_file.write(sku+"\n")
old_skus.append(sku)
skus_local_file.flush()
del upload_skus
upload_keywords=[]
for urls in new_keywords:
if urls not in old_keywords:
upload_keywords.append(urls)
del new_keywords
if upload_keywords!=[]:
for keyword in upload_keywords:
keywords_local_file.write(keyword+"\n")
old_keywords.append(keyword)
keywords_local_file.flush()
del upload_keywords
csv_reader_counter = csv_reader_counter + 1
fp_counter = fp_counter + 1
fl = open('lineno.txt',"w")
fl.write(str(fp_counter))
fl.close()
gc.collect()
os.remove('lineno.txt')
skus_local_file.close()
keywords_local_file.close()
fp.close()
del skus_local_file
del keywords_local_file
del fp
if __name__=='__main__':
main()
答案 0 :(得分:1)
将信息存储在sets中。
要检查新内容,您只需new_skus - old_skus
。
所以不是像以下那样的行:
for sku in new_skus:
if sku not in old_skus:
upload_skus.append(sku)
您可以new_skus.difference(old_skus)
使用new_skus
中的元素,而不是old_skus
中的元素。
如果您想存储该套装,可以使用泡菜。
import pickle
s = {1,2,3,4}
with open("s.pick","wb") as f: # pickle it to file
pickle.dump(s,f)
with open("s.pick","rb") as f1:
un_p = pickle.loads(f1.read()) # unpickle and use
print un_p
set([1, 2, 3, 4])
您还可以将对象附加到一个文件:
s2 = {4,5,6,7}
import pickle
with open("s.pick","ab") as f:
pickle.dump(s2,f)
with open("s.pick","rb") as f1:
s1 = pickle.load(f1)
s2 = pickle.load(f1)
print s1,s2
set([1, 2, 3, 4]) set([4, 5, 6, 7])
使用集合的示例:
s1={1, 2, 3, 4}
s2={4, 5, 6, 7}
s3={8,9,10,11}
print s1.difference(s2)
print s1.union(s2,s3)
set([1, 2, 3]) # in set 1 bit not in set 2
set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) # all elements in s1,s2 and s3
您可以使用以下方法将一组内容添加到另一组:
s1.update(s2) # add contents of s2 to s1
print "updated s1 with contents of s2", s1
updated s1 with contents of s2 set([1, 2, 3, 4, 5, 6, 7])