Question

我试图在google word2vec中找到重复的单词，例如，在word2vec中，有两个单词嵌入'Hello'和'hello'。这是我的代码，它很简单，但效率不高。

<li repeat.for="row of router.navigation" 
  if.bind="row.settings.pos == 'left'" 
  class="${ row.isActive ? 'link-active' : '' }">`

然而，由于预训练的谷歌word2vec有300万字，我的电脑已经运行了18个小时而没有完成，所以我想知道是否有一些有效的方法来获取重复的单词？

Answer 1

dayClick: function(date, jsEvent, view, resourceObj) { /* var date_start = moment(date.format()).unix(); var data1 = moment(date.format()).unix(); var selected_room = resourceObj.id;*/ alert(resourceObj.id); //alert('Clicked on: ' + date.format('hh:mm')); //alert('Current view: ' + view_type); $('#date_start').val(date.format('YYYY-MM-DD')); //$('#time_start').val(date.format('hh:mm:ss A')); $('#time_start').val(date.format('hh:mm A')); //$('#view_type').val(resourceObj.id); $('#selected_room').val(resourceObj.id); $('#fullCalModal_add_appointment').modal(); },对in的{{1}}成员资格测试是每次调用的O（N）时间复杂度，因此这种方法很慢并不奇怪。您可以通过简单地设置read_have一组并使用ready_have（假设ready_have.add(word.lower())的顺序不重要）来加快速度，或者使用read_have可能更清晰：

collections.Counter

注意我假设对于from collections import Counter my_counter = Counter(word.lower() for word in load_w2v()) ready_have, duplicated_words = [], [] for word, count in my_counter.items(): read_have.append(word) if count != 1: duplicated_words.append(word)解决方案，您只想将每个重复的单词附加一次，尽管这可能会被轻易改变。

迭代列表的有效方法？

1 个答案: