我正在处理药品标签的数据。该文本始终使用动词短语'表示为'。
例如:
sentence = "Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis"
我已经使用过SpaCy过滤到只包含短语'表示'的句子。
我现在需要一个能够接受句子的函数,并返回作为'指示的对象的短语。因此,对于此示例,我称之为extract()
的函数将按以下方式运行:
extract(sentence)
>> 'relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis'
是否有使用spacy执行此操作的功能?
编辑: 在'指示之后简单地拆分不会为复杂的例子工作。
以下是一些例子:
'''丁丙诺啡和纳洛酮舌下片用于维持治疗阿片类药物依赖,应作为完整治疗计划的一部分,包括咨询和社会心理支持丁丙诺啡和纳洛酮舌下片含有丁丙诺啡,部分阿片类激动剂和纳洛酮,阿片类拮抗剂,适用于阿片类药物依赖的维持治疗'''
''' 氧氟沙星眼用溶液适用于治疗由下列细菌引起的感染引起的感染在结膜炎革兰氏阳性菌克下列出的条件下阴性菌金黄色葡萄球菌表皮葡萄球菌表皮葡萄球菌链球菌肠杆菌阴沟杆菌流感嗜血杆菌变形虫铜绿假单胞菌角膜溃疡革兰氏阳性菌革兰氏阴性菌金黄色葡萄球菌表皮葡萄球菌肺炎链球菌铜绿假单胞菌粘质沙雷氏菌''我只想要大胆的部分。
答案 0 :(得分:5)
# -*- coding: utf-8 -*-
#!/usr/bin/env python
from __future__ import unicode_literals
import spacy
nlp = spacy.load('en_core_web_sm')
text = 'Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.'
doc = nlp(text)
for word in doc:
if word.dep_ in ('pobj'):
subtree_span = doc[word.left_edge.i : word.right_edge.i + 1]
print(subtree_span.text)
<强>输出:强>
relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
the signs and symptoms of osteoarthritis and rheumatoid arthritis
osteoarthritis and rheumatoid arthritis
多输出的原因是多个pobj。
编辑2:
# -*- coding: utf-8 -*-
#!/usr/bin/env python
from __future__ import unicode_literals
import spacy
nlp = spacy.load('en_core_web_sm')
para = '''Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.
Ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below.'''
doc = nlp(para)
# To extract sentences based on key word
indicated_for_sents = [sent for sent in doc.sents if 'indicated for' in sent.string]
print indicated_for_sents
print
# To extract objects of verbs
for word in doc:
if word.dep_ in ('pobj'):
subtree_span = doc[word.left_edge.i : word.right_edge.i + 1]
print(subtree_span.text)
<强>输出:强>
[Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.
, Ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below.]
relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
the signs and symptoms of osteoarthritis and rheumatoid arthritis
osteoarthritis and rheumatoid arthritis
the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below
infections caused by susceptible strains of the following bacteria in the conditions listed below
susceptible strains of the following bacteria in the conditions listed below
the following bacteria in the conditions listed below
the conditions listed below
检查此链接
答案 1 :(得分:0)
您不需要SpaCy。你可以做正则表达式或只是拆分:
sentence = "Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis"
sentence.split('indicated for ')[1]
>>> relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
这是基于对字符串的假设,例如“指示”只出现一次,其后的所有内容都是您想要的等等。
语法注释:你所寻找的实际上是间接对象,而不是主题。受试者是“美洛昔康片剂”。
答案 2 :(得分:0)
您需要使用Spacy的依赖项解析功能。包含('shown for')的所选句子应该在Spacy中进行依赖性解析,以显示所有单词之间的关系。您可以使用Spacy here在您的问题中看到针对示例句子的依赖项解析的可视化。
在Spacy返回依赖关系解析后,您需要搜索“指示”标记作为动词并找到依赖关系树的子项。请参见示例here。在您的情况下,您将看起来将“指示”匹配为动词,并在Github示例中获取子项而不是“xcomp”或“ccomp”。
答案 3 :(得分:-1)
尝试查看此Noun phrases with spacy和https://spacy.io/usage/linguistic-features#noun-chunks。我不是SpaCy的专家,但这应该有所帮助。