创建唯一项目列表的更快捷方式

时间:2017-10-02 20:45:23

标签: python python-3.x

我实现了一个创建词汇表的代码,如下所示。 qalist of stringslist of list of string。尽管它很有效,但它真的很慢,因为data非常大。我认为这段代码并不那么聪明。

你认为有没有办法优雅地实施?

 vocab = functools.reduce(lambda x, y: x | y, (set(list(chain.from_iterable(s)) + q + a) for s, q, a in data))

我写了一个简单的程序来测试这段代码。如下所示,data长度在实际数据集中非常大。

 import time                                                                                                                                                 
 import functools                                                                                                                                            
 from itertools import chain                                                                                                                                 

 s1 = [                                                                                                                                                      
 ['a', 'b', 'fwa'], # actual length is around 10                                                                                                                                 
 ['foo', 'ixb', 'fwa'],                                                                                                                                      
 ['fj', 'fab', 'fwa']                                                                                                                                        
 ]                                                                                                                                                           

 q1 = ['fwa', 'fawh'] # actual length is around 10                                                                                                                                

 a1 = ['fjj', 'jfaw'] # actual length is around 3                                                                                                                                 

 data = []                                                                                                                                                   
 for i in range(10000000):                                                                                                                                    
     data.append((s1, q1, a1))                                                                                                                               


 start = time.time()                                                                                                                                         
 vocab = functools.reduce(lambda x, y: x | y, (set(list(chain.from_iterable(s)) + q + a) for s, q, a in data)) # my way                                      
 elapsed_time = time.time() - start                                                                                                                          
 print(elapsed_time) # 11.522738695144653                                                                                                                                         
 print(vocab)                                                                                                                                                

 start = time.time()                                                                                                                                         
 vocab = functools.reduce(lambda x, y: x | y, (set(chain(chain.from_iterable(s), q, a)) for s, q, a in data)) # @cowbert                                     
 elapsed_time = time.time() - start                                                                                                                          
 print(elapsed_time) # 9.918306350708008                                                                                                                                  
 print(vocab)  

0 个答案:

没有答案