Question

我是相对较新的人，所以我可能犯了一些非常基本的错误，但是据我了解，您将在python的list-in-a-list中的令牌内进行迭代，如下所示：

for each_list in full_list:
  for each_token in each_list:
    do whatever you wannna do

但是，当使用SpaCy时，似乎第一个for循环是遍历令牌而不是列表。

代码如下：

for eachlist in alice:
  if len(eachlist) > 5:
     print eachlist

（其中alice是列表的列表，每个列表是包含标记词的句子）

实际上打印每个超过5个字母的单词，而不是打印每个超过5个单词的句子（如果它确实处于“第一级” for循环中，则应该这样做。

和代码：

newalice = []
for eachlist in alice:
  for eachword in eachlist:
    #make a new list of lists where each list contains only words that are classified as nouns, adjectives, or verbs (with a few more specific stipulations)
    if (eachword.pos_ == 'NOUN' or eachword.pos_ == 'VERB' or eachword.pos_ == 'ADJ') and (eachword.dep_ != 'aux') and (eachword.dep_ != 'conj'):
        newalice.append([eachword])

返回错误：“ TypeError：'spacy.tokens.token.Token'对象不可迭代。”

我想在嵌套的for循环中执行此操作的原因是，我希望newalice成为列表列表（我仍然希望能够遍历句子，我只是想摆脱掉我不喜欢的单词不在乎）。

我不知道我是否在代码中犯了一些基本错误，或者SpaCy是否做得很奇怪，但是无论哪种方式，我都非常感谢在迭代列表中的项目方面提供的帮助SpaCy中的-a-list，同时保持原始列表的完整性。

Answer 1

以下是用于迭代嵌套列表元素的代码：

list_inst = [ ["this", " ", "is", " ", "a", " ", "sentence"], ["another", " ", "one"]]
for sentence in list_inst:
    for token in sentence:
        print(token, end="")
    print("")

我认为您的误解是因为这样一个事实，即每个乱七八糟的句子都没有存储在列表中，而是存储在doc对象中。 doc对象是可迭代的，并且包含令牌，但也包含一些其他信息。

示例代码：

# iterate to sentences after spacy preprocessing
import spacy
nlp = spacy.load('en_core_web_sm')
doc1 = nlp("this is a sentence")
doc2 = nlp("another one")
list_inst = [doc1, doc2]
for doc in list_inst:
    for token in doc:
        print(token, end=" ")
    print("")

输出相同。

希望有帮助！

使用python中的for循环在列表内列表中的令牌上进行迭代（SpaCy）

1 个答案: