Question

在尝试小的word2vec并遇到如下问题之前，我正在对嵌套列表进行一些预处理：

\n

[[['he'，'is'，'a'，'brave'，'king']，['she'，'is'，'a'，'kind'，'queen']，['他”，“是”，“一个”，“年轻”，“男孩”]，[“她”，“是”，“一个”，“温柔”，“女孩”]]

因此，上面的输出作为嵌套列表给出，我打算删除停用词，例如“是”，“ a”。

strcat

[[''he'，'a'，'勇敢'，'国王']，['she'，'a'，'种类'，'女王']，['he'，'a'，'年轻”，“男孩”]，[“她”，“ a”，“温柔”，“女孩”]]

输出似乎表明在删除每个子列表中的“ is”之后，循环跳到了下一个子列表，而不是完全迭代。

这背后的原因是什么？指数？如果是这样，假设我想保留嵌套结构，该如何解决。

Answer 1

所有代码都是正确的，除了较小的更改：使用[:]使用列表的副本遍历内容，并避免通过引用原始列表进行更改。具体来说，您将列表的副本创建为lst_copy = lst[:]。这是一种可以复制的方式（有关详细信息，请参见here）。当您遍历原始列表并通过删除项目来修改列表时，计数器会产生您所观察到的问题。

for _ in range(0, len(corpus)):
     for x in corpus[_][:]: # <--- create a copy of the list using [:]
         if x == 'is' or x == 'a':
             corpus[_].remove(x)

输出

[['he', 'brave', 'king'],
 ['she', 'kind', 'queen'],
 ['he', 'young', 'boy'],
 ['she', 'gentle', 'girl']]

Answer 2

也许您可以定义一个自定义方法来拒绝符合特定条件的元素。与itertools类似（例如：itertools.dropwhile）。

def reject_if(predicate, iterable):
  for element in iterable:
    if not predicate(element):
      yield element

一旦有了适当的方法，就可以使用这种方式：

stopwords = ['is', 'and', 'a']
[ list(reject_if(lambda x: x in stopwords, ary)) for ary in corpus ]
#=> [['he', 'brave', 'king'], ['she', 'kind', 'queen'], ['he', 'young', 'boy'], ['she', 'gentle', 'girl']]

Answer 3

嵌套= [input（）]

嵌套= [i中的i.split（）嵌套]

嵌套列表迭代

3 个答案: