使用spacy和html突出显示动词短语

时间:2018-08-28 01:31:21

标签: html beautifulsoup nltk spacy

我已经设计了一个红色字体动词短语并将其输出为HTML的代码。

from __future__ import unicode_literals
import spacy,en_core_web_sm
import textacy
import codecs
nlp = en_core_web_sm.load()
sentence = 'The author is writing a new book. The dog is barking.'
pattern = r'<VERB>?<ADV>*<VERB>+'
doc = textacy.Doc(sentence, lang='en_core_web_sm')
lists = textacy.extract.pos_regex_matches(doc, pattern)
with open("my.html","w") as fp:
    for list in lists:
        search_word = (list.text)
        fp.write(sentence.replace(search_word, '<span style="color: red">{}</span>'.format(search_word)))

当前输出

The author **is writing** a new book. The dog is barking.The author is writing a new book. The dog **is barking.**

句子被重复两次,首先是写作,最后是吠叫。

预期输出:

The author **is writing** a new book. The dog **is barking.**

在发送到列表检查之前,我是否必须对句子进行标记处理?请帮忙吗?

1 个答案:

答案 0 :(得分:1)

找到了另一种更合乎逻辑的方法。与其替换整个句子,不如替换具有模式的句子。

with open("my.html","w") as fp:
for list in lists:
    search_word = (list.text)
    containing_sentence = [i for i in sentence.split('.') if str(search_word) in str(i)][0]
    fp.write(containing_sentence.replace(search_word, '<span style="color: red">{}</span>'.format(search_word)))

以上代码将分别写出句子。如果要以句子的形式进行操作,请将修改内容附加到列表中,然后将其加入,然后按如下所示写入文件。

mod_sentence = []
for list in lists:
    search_word = (list.text)
    containing_sentence = [i for i in sentence.split('.') if str(search_word) in str(i)][0]+'.'
    mod_sentence.append(containing_sentence.replace(search_word, '<span style="color: red">{}</span>'.format(search_word)))
with open("my.html","w") as fp:
    fp.write(''.join(mod_sentence))

希望这会有所帮助!干杯!