我必须在Notepad ++中提取</con>
和<con
之间的文本(即删除除这两个单词之间的文本之外的文本)。
我的示例数据是这样的:
<abstract>
<sentence>The <cons lex="CD4_coreceptor" sem="G#protein_molecule">CD4 coreceptor</cons> interacts with <cons lex="non-polymorphic_region" sem="G#protein_domain_or_region">non-polymorphic regions</cons> of <cons lex="major_histocompatibility_complex_class_II_molecule" sem="G#protein_family_or_group">major histocompatibility complex class II molecules</cons> on <cons lex="antigen-presenting_cell" sem="G#cell_type">antigen-presenting cells</cons> and contributes to <cons lex="T_cell_activation" sem="G#other_name">T cell activation</cons>.</sentence>
<sentence>We have investigated the effect of <cons lex="CD4_triggering" sem="G#other_name"><cons lex="CD4" sem="G#protein_molecule">CD4</cons> triggering</cons> on <cons lex="T_cell_activating_signal" sem="G#other_name">T cell activating signals</cons> in a <cons lex="lymphoma_model" sem="G#other_name">lymphoma model</cons> using <cons lex="monoclonal_antibody" sem="G#protein_family_or_group">monoclonal antibodies</cons> (<cons lex="mAb" sem="G#protein_domain_or_region">mAb</cons>) which recognize different <cons lex="CD4_epitope" sem="G#protein_family_or_group">CD4 epitopes</cons>.</sentence>
<sentence>We demonstrate that <cons lex="CD4_triggering" sem="G#other_name"><cons lex="CD4" sem="G#protein_molecule">CD4</cons> triggering</cons> delivers signals capable of activating the <cons lex="NF-AT_transcription_factor" sem="G#protein_molecule">NF-AT transcription factor</cons> which is required for <cons lex="interleukin-2_gene_expression" sem="G#other_name"><cons lex="interleukin-2" sem="G#protein_molecule">interleukin-2</cons> gene expression</cons>.</sentence>
<sentence>Whereas different <cons lex="anti-CD4_mAb" sem="G#protein_family_or_group">anti-CD4 mAb</cons> or <cons lex="HIV-1_gp120" sem="G#protein_molecule"><cons lex="HIV-1" sem="G#virus">HIV-1</cons> gp120</cons> could all trigger activation of the <cons lex="protein_tyrosine_kinase" sem="G#protein_family_or_group">protein tyrosine kinases</cons> <cons lex="p56lck" sem="G#protein_molecule">p56lck</cons> and <cons lex="p59fyn" sem="G#protein_molecule">p59fyn</cons> and phosphorylation of the <cons lex="Shc_adaptor_protein" sem="G#protein_molecule">Shc adaptor protein</cons>, which mediates signals to <cons lex="Ras" sem="G#protein_family_or_group">Ras</cons>, they differed significantly in their ability to activate <cons lex="NF-AT" sem="G#protein_molecule">NF-AT</cons>.</sentence>
<sentence>Lack of full activation of <cons lex="NF-AT" sem="G#protein_molecule">NF-AT</cons> could be correlated to a dramatically reduced capacity to induce <cons lex="calcium_flux" sem="G#other_name"><cons lex="calcium" sem="G#atom">calcium</cons> flux</cons> and could be complemented with a <cons lex="calcium_ionophore" sem="G#other_organic_compound">calcium ionophore</cons>.</sentence>
<sentence>The results identify functionally distinct <cons lex="epitope" sem="G#protein_family_or_group">epitopes</cons> on the <cons lex="CD4_coreceptor" sem="G#protein_molecule">CD4 coreceptor</cons> involved in activation of the <cons lex="Ras/protein_kinase_C_and_calcium_pathway" sem="G#other_name"><cons lex="Ras/protein_kinase_C" sem="G#protein_molecule"><cons lex="Ras/protein_kinase_C_pathway" sem="G#other_name"><cons lex="Ras" sem="G#protein_molecule">Ras</cons><cons lex="protein_kinase_C" sem="G#protein_molecule">/protein kinase C</cons></cons></cons> and <cons lex="calcium_pathway" sem="G#other_name">calcium pathways</cons></cons>.</sentence>
</abstract>
我想要的出局是:
interacts with
of
on
and contributes to
on
in
using
which recognize different
triggering
delivers signals capable of activating the
which is required for
or
could all trigger activation of the
and
我尝试使用正则表达式.*<\/cons>(.*?)<cons.*
并替换为$1
,但这只会在每次</con>
的{{1}}和<con
的最后一次出现时提供数据而我的句子包含多个这些标签。
任何可以帮助我的人?
答案 0 :(得分:3)
使用此.*Word1(.*?)Word2.*
表达式如果您正在处理任何符号,请不要忘记添加转义字符/
。
答案 1 :(得分:2)
使用正则表达式进行搜索和替换。
将搜索模式更改为“正则表达式”,并将 <[^>]*>
替换为空。
修改的
将搜索模式更改为“正则表达式”,并将.*<\/cons>(.*?)<cons.*
替换为$1
。