我想从长文本中提取特定段落。 如:
txt1 <- "What is claimed is:
1. A hybridized CMP conditioner, comprising: a base;
a first abrasive unit, provided on said base and comprising a first
bonding layer fixed on said base, a substrate for abrasive unit provided
on said first bonding layer and an abrasive layer provided on said
substrate for abrasive unit, said abrasive layer being a diamond coating
formed through a chemical vapor deposition process, and said diamond
coating being provided on the surface thereof with a plurality of abrasive
tips.
2. The hybridized CMP conditioner according to claim 1, wherein said base
is provided on the surface thereof with a central region and an annular
outer region around the outside of said central region.
3. The hybridized CMP conditioner according to claim 2, wherein said
central region is provided with a recessed portion for said first abrasive
unit to be provided therein, and said annular outer region is provided
with a plurality of first accommodating portions spaced apart from each
other for said second abrasive units to be provided therein. "
我只想提取第一段。像这样:
1. A hybridized CMP conditioner, comprising: a base;
a first abrasive unit, provided on said base and comprising a first
bonding layer fixed on said base, a substrate for abrasive unit provided
on said first bonding layer and an abrasive layer provided on said
substrate for abrasive unit, said abrasive layer being a diamond coating
formed through a chemical vapor deposition process, and said diamond
coating being provided on the surface thereof with a plurality of abrasive
tips.
我试过用strsplit函数来做
strsplit(txt1, "\n1.", perl = TRUE)
但结果不是我想要的。
[1] "What is claimed is:"
[2] " A hybridized CMP conditioner, comprising: a base; \na first abrasive
unit, provided on said base and comprising a first bonding layer fixed on
said base, a substrate for abrasive unit provided on said first bonding
layer and an abrasive layer provided on said substrate for abrasive unit,
said abrasive layer being a diamond coating formed through a chemical
vapor deposition process, and said diamond coating being provided on the
surface thereof with a plurality of abrasive tips; and \na plurality of
second abrasive units, provided on said base and comprising a second
bonding layer fixed on said base, a carrying post provided on said second
bonding layer, an abrasive particle provided on said carrying post and an
abrasive material-bonding layer provided between said carrying post and
said abrasive particle. \n2. The hybridized CMP conditioner according to
claim 1, wherein said base is provided on the surface thereof with a
central region and an annular outer region around the outside of said
central region. "
答案 0 :(得分:0)
使用strsplit
:
# split at newline followed by number and '.'
paragraphs <- unlist(strsplit(txt1, "\\n(?=(\\d+\\. ))", perl = TRUE))
# get rid of newlines and select 1st paragraph
gsub(" *\\n", " ", paragraphs)[2]