索引搜索列表以提高python的性能

时间:2019-01-07 09:19:27

标签: python python-3.x

我有大量的dependencies { implementation fileTree(dir: 'libs', include: ['*.jar']) implementation 'com.android.support:appcompat-v7:28.0.0' implementation 'com.android.support.constraint:constraint-layout:1.1.3' testImplementation 'junit:junit:4.12' androidTestImplementation 'com.android.support.test:runner:1.0.2' androidTestImplementation 'com.android.support.test.espresso:espresso-core:3.0.2' implementation 'com.google.firebase:firebase-database:16.0.1' implementation 'com.google.firebase:firebase-database:16.0.1:15.0.0' } 和大量的concepts。我要按敏感度顺序在sentences中标识concepts。我正在按照以下方式使用sentencesmultithreading来执行此任务。

for loops

这是迄今为止我拥有的最有效的代码。但是,使用我的真实数据集仍然很慢。

我的import queue import threading sentences = ['data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning statistics and database systems', 'data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information from a data set and transform the information into a comprehensible structure for further use', 'data mining is the analysis step of the knowledge discovery in databases process or kdd'] concepts = ['data mining', 'database systems', 'databases process', 'interdisciplinary subfield', 'information', 'knowledge discovery', 'methods', 'machine learning', 'patterns', 'process'] def func(sentence): sentence_tokens = [] for item in concepts: index = sentence.find(item) if index >= 0: sentence_tokens.append((index, item)) sentence_tokens = [e[1] for e in sorted(sentence_tokens, key=lambda x: x[0])] return sentence_tokens def do_find_all_concepts(q_in, l_out): while True: sentence = q_in.get() l_out.append(func(sentence)) q_in.task_done() # Queue with default maxsize of 0, infinite queue size sentences_q = queue.Queue() output = [] counting = 0 # any reasonable number of workers num_threads = 4 for i in range(num_threads): worker = threading.Thread(target=do_find_all_concepts, args=(sentences_q, output)) # once there's nothing but daemon threads left, Python exits the program worker.daemon = True worker.start() # put all the input on the queue for s in sentences: sentences_q.put(s) counting = counting + 1 print(counting) # wait for the entire queue to be processed sentences_q.join() print(output) 列表按字母顺序排列。因此,我想知道python中是否有conceptsindexing机制仅使用句子中第一个单词的字符来搜索serialisation列表的一部分(而不是搜索整个concepts列表)

我主要关心的是时间复杂度(因为根据我目前的时间估算,运行数据需要将近1.5周的时间)。空间复杂性不是问题。

如果需要,我很乐意提供更多详细信息。

0 个答案:

没有答案