Spacy NLP - 使用正则表达式进行分块

时间:2016-11-21 09:13:01

标签: python nlp part-of-speech spacy

Spacy包含noun_chunks功能来检索一组名词 - 删除。 函数english_noun_chunks(附后附件)使用word.pos == NOUN

def english_noun_chunks(doc):
    labels = ['nsubj', 'dobj', 'nsubjpass', 'pcomp', 'pobj',
              'attr', 'root']
    np_deps = [doc.vocab.strings[label] for label in labels]
    conj = doc.vocab.strings['conj']
    np_label = doc.vocab.strings['NP']
    for i in range(len(doc)):
        word = doc[i]
        if word.pos == NOUN and word.dep in np_deps:
            yield word.left_edge.i, word.i+1, np_label
        elif word.pos == NOUN and word.dep == conj:
            head = word.head
            while head.dep == conj and head.head.i < head.i:
                head = head.head
            # If the head is an NP, and we're coordinated to it, we're an NP
            if head.dep in np_deps:
                yield word.left_edge.i, word.i+1, np_label

我想从保留一些正则表达式的句子中获取块。例如,我是零个或多个形容词后跟一个或多个名词的短语。

{(<JJ>)*(<NN | NNS | NNP>)+}

是否可以不覆盖english_noun_chunks功能?

1 个答案:

答案 0 :(得分:3)

你可以在不失去任何性能的情况下重写这个函数,因为它是在纯python中实现的,但为什么不在你得到它们之后过滤那些块呢?

        public void setZoom(float newValue) {
           int dScroll = (int) ((ADAPTER_ZOOM - newValue) * getResources().getDisplayMetrics().widthPixels/2f);   //assume you Recycler view uses whole display 
           ADAPTER_ZOOM = newValue;
           resetBoundaries();
           //
           epgAdapter.notifyDataSetChanged();
           //
           epgRecyclerView.scrollBy(dScroll, 0);
        }