我使用以下正则表达式匹配以单词&#34开头的段落;摘要",
([^\']*(?=Summary)[^\']*)
但它匹配所有文字:regex101a
也试过
(?<=Summary).*?(?=]\.)
这与任何内容都不匹配:regex101b
我认为这与文本文件的格式有关。
以下是一个例子:
COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The
reference sequence was derived from AC105339.9 and FJ695193.1.
This sequence is a reference standard in the RefSeqGene project.
Summary: Adaptor protein complex 3 (AP-3 complex) is a
heterotrimeric protein complex involved in the formation of
clathrin-coated synaptic vesicles. The protein encoded by this gene
represents the beta subunit of the neuron-specific AP-3 complex and
was first identified as the target antigen in human paraneoplastic
neurologic disorders. The encoded subunit binds clathrin and is
phosphorylated by a casein kinase-like protein, which mediates
synaptic vesicle coat assembly. Defects in this gene are a cause of
early-onset epileptic encephalopathy. [provided by RefSeq, Feb
2017].
PRIMARY REFSEQ_SPAN PRIMARY_IDENTIFIER PRIMARY_SPAN COMP
1-35060 AC105339.9 88079-123138
35061-35259 FJ695193.1 1-199 c
35260-57628 AC105339.9 123337-145705
这就是我的目标:
Summary: Adaptor protein complex 3 (AP-3 complex) is a
heterotrimeric protein complex involved in the formation of
clathrin-coated synaptic vesicles. The protein encoded by this gene
represents the beta subunit of the neuron-specific AP-3 complex and
was first identified as the target antigen in human paraneoplastic
neurologic disorders. The encoded subunit binds clathrin and is
phosphorylated by a casein kinase-like protein, which mediates
synaptic vesicle coat assembly. Defects in this gene are a cause of
early-onset epileptic encephalopathy. [provided by RefSeq, Feb
2017].
答案 0 :(得分:2)
我认为这是一个与您的段落匹配的强大模式(使用Multiline标志):
^\s+$\n^([ \t]+)Summary.*(?:\n\1[ \t]*\S.*)+
工作示例:https://regex101.com/r/P6KlBa/2
([ \t]+)
捕获每行开头的空格数。某些口味horizontal spaces有\h
。Summary.*
- 第一行以&#34;摘要&#34;。(\n\1([ \t]+)*\S.*)*
- 匹配更多非空行。