TypeError:序列项352:预期的str实例,找到NoneType

时间:2017-12-11 18:11:49

标签: python string list

我正在尝试在我的语料库中执行句子分块。首先我加载了我的标记数据,然后我试图在标记的语料库中执行分块。这是我的代码。

def load_corpus():
    corpus_root = os.path.abspath('../nlp1/dumpfiles')
    mycorpus = nltk.corpus.reader.TaggedCorpusReader(corpus_root,'.*')
    return mycorpus.tagged_sents()

def sents_chunks(tagg_sents, pos_tag_pattern):
    chunk_freq_dict = defaultdict(int)
    chunker = nltk.RegexpParser(pos_tag_pattern)
    for sent in tagg_sents:
        if not all(sent):
          print("NoneType object in \"{}\": {}".format(sent.label(),sent))
          sent = cast_to_tree_function(filter(bool, sent)) 
        for chk in chunker.parse(sent).subtrees():
            if str(chk).startswith('(NP'):
                phrase = chk.__unicode__()[4:-1]
                #print(phrase)
                if '\n' in phrase:
                    phrase = ' '.join(phrase.split())
                    #print(phrase)
                chunk_freq_dict[phrase] += 1
    #print(chunk_freq_dict)
    return chunk_freq_dict 

我在我的语料库中的某个地方出现错误,我不知道的地方和原因。任何人都知道这是什么问题,我该如何解决?这是错误:

Traceback (most recent call last):
  File "multiwords1.py", line 184, in <module>
    candidates = main(domain_corpus, PATTERN,MIN_FREQ,MIN_CVAL)
  File "multiwords1.py", line 156, in main
    chunks_freqs = sents_chunks(domain_sents, pos_tag_pattern)
  File "multiwords1.py", line 23, in sents_chunks
    for chk in chunker.parse(sent).subtrees():
  File "/usr/local/lib/python3.5/dist-packages/nltk/chunk/regexp.py", line 1208, in parse
    chunk_struct = parser.parse(chunk_struct, trace=trace)
  File "/usr/local/lib/python3.5/dist-packages/nltk/chunk/regexp.py", line 1023, in parse
    chunkstr = ChunkString(chunk_struct)
  File "/usr/local/lib/python3.5/dist-packages/nltk/chunk/regexp.py", line 98, in __init__
    self._str = '<' + '><'.join(tags) + '>'
TypeError: sequence item 352: expected str instance, NoneType found

1 个答案:

答案 0 :(得分:0)

你有一个TypeError执行。它来自标签的消息项352没有类型(NoneType),这意味着sent (ntlk.tree.Tree class)中有一个NoneType对象。

This line is the reason for the exception,因为str.join只能str。您需要检查sent iterablestr type所属关联的每个项目。

您可以使用filter内置函数,但结果应该转换为Tree type

filter(bool, sent) # return a iterator with valid items

要检查可迭代对象具有NoneType项目,您可以执行以下操作:

if not all(sent):
    print("NoneType object in \"{}\": {}".format(sent.label(), sent))
    sent = cast_to_tree_function(filter(bool, sent))  # update set object to valid items