我正在尝试使用正则表达式来提取'文件中的段落。每个段落前后都有' - '在单独的行上,每个段落都以数字开头。
例如
-
这是一个段落 它可能会超过多行 -
理想情况下,我不想包含' - ',但它并不重要,因为我将它放在一个字符串中并运行另一个正则表达式(一个我知道工作)
我尝试使用的代码基本如下
Dim matchPara as Object
Dim regex as Object
Dim theMatch as Object
Dim matches as Object
Dim fileName as String
Dim fileNo as Integer
Dim document as String
matchPara = "-?(\d.*?)?-"
Set regex = CreateObject("VBScript.RegExp")
regex.Pattern = matchPara
regex.Global = True
regex.Multiline = True
fileName = "C:\file.txt"
fileNo = FreeFile
Open fileName For Input As #fileNo
document = Input$(LOF(fileNo), fileNo)
set matches = regex.Execute(document)
For Each theMatch in matches
MsgBox(theMatch.Value)
Next theMatch
Close #fileNo
我在regex101测试了这个正则表达式,它似乎做了我想要的。我也没有分组测试它
- ?\ d * -
但是,当我运行代码时,theMatch.Value
只包含一个' - '。在对正则表达式进行了一些处理后,我得到它来显示第一行文本,但从来没有超过第一行。
我已经用<:p>检查了 theMatch.Value 的长度
MsgBox(len(theMatch.Value))
并将theMatch.Value
的内容放在工作表的单元格中,以查看它是否在消息框中被切断,但两种理论都被证明是错误的。
我现在完全失去了,我开始怀疑它可能是VBA而不是正则表达式。没有要求使用正则表达式,我只是假设它是最简单的事情。
段落包含我想要提取的数据。因此,我们的想法是将每个段落的正则表达式放在一个字符串中然后运行其他正则表达式来获取我需要的信息。有些段落不包含我需要的数据,因此想法是遍历每个单独的段落,然后如果我需要的数据不在该段落中,那么错误处理得更好(即得到我能做的并且丢失其余的错误消息)
以下是截图:
答案 0 :(得分:1)
这种简单的方法不使用 Regex 。它假定数据位于 A 列中,并且段落位于 B 列中:
Sub paragraph_no_regex()
Dim s As String
Dim ary
With Application.WorksheetFunction
s = .TextJoin(" ", False, Columns(1).SpecialCells(2))
End With
ary = Split(s, "-")
i = 1
For Each a In ary
Cells(i, 2) = a
i = i + 1
Next a
End Sub
答案 1 :(得分:0)
#NAV_1 {
box-sizing: border-box;
color: rgb(137, 137, 137);
height: 227.538px;
overflow-wrap: break-word;
text-align: justify;
text-decoration: none solid rgb(137, 137, 137);
text-size-adjust: 100%;
width: 750px;
word-wrap: break-word;
column-rule-color: rgb(137, 137, 137);
perspective-origin: 375px 113.762px;
transform-origin: 375px 113.762px;
caret-color: rgb(137, 137, 137);
border: 0px none rgb(137, 137, 137);
font: normal normal 400 normal 14px / 25.9px Verdana;
margin: 0px 101.8px;
outline: rgb(137, 137, 137) none 0px;
padding: 0px 15px;
}/*#NAV_1*/
#NAV_1:after {
box-sizing: border-box;
clear: both;
color: rgb(137, 137, 137);
content: '"' '"';
display: table;
height: 0px;
overflow-wrap: break-word;
text-align: justify;
text-decoration: none solid rgb(137, 137, 137);
text-size-adjust: 100%;
width: 0px;
word-wrap: break-word;
column-rule-color: rgb(137, 137, 137);
perspective-origin: 0px 0px;
transform-origin: 0px 0px;
caret-color: rgb(137, 137, 137);
border: 0px none rgb(137, 137, 137);
font: normal normal 400 normal 14px / 25.9px Verdana;
outline: rgb(137, 137, 137) none 0px;
}/*#NAV_1:after*/
#NAV_1:before {
box-sizing: border-box;
color: rgb(137, 137, 137);
content: '"' '"';
display: table;
height: 0px;
overflow-wrap: break-word;
text-align: justify;
text-decoration: none solid rgb(137, 137, 137);
text-size-adjust: 100%;
width: 0px;
word-wrap: break-word;
column-rule-color: rgb(137, 137, 137);
perspective-origin: 0px 0px;
transform-origin: 0px 0px;
caret-color: rgb(137, 137, 137);
border: 0px none rgb(137, 137, 137);
font: normal normal 400 normal 14px / 25.9px Verdana;
outline: rgb(137, 137, 137) none 0px;
}/*#NAV_1:before*/
#BUTTON_2 {
color: rgba(0, 0, 0, 0.8);
cursor: pointer;
display: none;
height: auto;
overflow-wrap: break-word;
text-decoration: none solid rgba(0, 0, 0, 0.8);
text-size-adjust: 100%;
width: auto;
word-wrap: break-word;
column-rule-color: rgba(0, 0, 0, 0.8);
perspective-origin: 50% 50%;
transform-origin: 50% 50%;
caret-color: rgba(0, 0, 0, 0.8);
background: rgba(0, 0, 0, 0) none repeat scroll 0% 0% / auto padding-box border-box;
border: 0px none rgba(0, 0, 0, 0.8);
font: normal normal 400 normal 20px / 20px "Source Sans Pro", sans-serif;
margin: 0px auto;
outline: rgba(0, 0, 0, 0.8) none 0px;
padding: 20px 0px;
}/*#BUTTON_2*/
#BUTTON_2:after {
box-sizing: border-box;
color: rgba(0, 0, 0, 0.8);
content: '""';
cursor: pointer;
overflow-wrap: break-word;
text-decoration: none solid rgba(0, 0, 0, 0.8);
text-size-adjust: 100%;
word-wrap: break-word;
column-rule-color: rgba(0, 0, 0, 0.8);
caret-color: rgba(0, 0, 0, 0.8);
border: 0px none rgba(0, 0, 0, 0.8);
font: normal normal 400 normal 20px / 20px FontAwesome;
outline: rgba(0, 0, 0, 0.8) none 0px;
padding: 0px 0px 0px 20px;
}/*#BUTTON_2:after*/
#DIV_3 {
box-sizing: border-box;
color: rgb(137, 137, 137);
height: 28.4px;
overflow-wrap: break-word;
text-align: center;
text-decoration: none solid rgb(137, 137, 137);
text-size-adjust: 100%;
width: 720px;
word-wrap: break-word;
column-rule-color: rgb(137, 137, 137);
perspective-origin: 360px 14.2px;
transform-origin: 360px 14.2px;
caret-color: rgb(137, 137, 137);
border: 0px none rgb(137, 137, 137);
font: normal normal 400 normal 14px / 25.9px Verdana;
outline: rgb(137, 137, 137) none 0px;
}/*#DIV_3*/
#H6_4 {
box-sizing: border-box;
clear: both;
color: rgb(51, 51, 51);
height: 22.4px;
overflow-wrap: break-word;
text-align: center;
text-decoration: none solid rgb(51, 51, 51);
text-size-adjust: 100%;
width: 720px;
word-wrap: break-word;
column-rule-color: rgb(51, 51, 51);
perspective-origin: 360px 11.2px;
transform-origin: 360px 11.2px;
caret-color: rgb(51, 51, 51);
border: 0px none rgb(51, 51, 51);
font: normal normal 400 normal 18px / 22.5px "Droid Serif", serif;
margin: 41.94px 0px -4px;
outline: rgb(51, 51, 51) none 0px;
}/*#H6_4*/
#HR_5 {
color: rgb(137, 137, 137);
height: 3px;
overflow-wrap: break-word;
text-align: center;
text-decoration: none solid rgb(137, 137, 137);
text-size-adjust: 100%;
width: 720px;
word-wrap: break-word;
column-rule-color: rgb(137, 137, 137);
perspective-origin: 279.2px 0.5px;
transform-origin: 279.2px 0.5px;
caret-color: rgb(137, 137, 137);
background: rgb(25, 23, 98) none repeat scroll 0% 0% / auto padding-box border-box;
border: 0px none rgb(0, 0, 0);
font: normal normal 400 normal 13px / 16.003px "Helvetica Neue", Helvetica, Arial, sans-serif;
margin: 7px 0px 16px;
outline: rgb(137, 137, 137) none 0px;
}/*#HR_5*/
#DIV_6 {
box-sizing: border-box;
color: rgb(137, 137, 137);
height: 171.2px;
overflow-wrap: break-word;
text-align: justify;
text-decoration: none solid rgb(137, 137, 137);
text-size-adjust: 100%;
width: 720px;
word-wrap: break-word;
column-rule-color: rgb(137, 137, 137);
perspective-origin: 360px 85.6px;
transform-origin: 360px 85.6px;
caret-color: rgb(137, 137, 137);
border: 0px none rgb(137, 137, 137);
font: normal normal 400 normal 14px / 25.9px Verdana;
margin: -30px 0px 0px;
outline: rgb(137, 137, 137) none 0px;
}/*#DIV_6*/
#UL_7 {
box-sizing: border-box;
color: rgb(25, 23, 98);
height: 171.2px;
overflow-wrap: break-word;
text-align: center;
text-decoration: none solid rgb(25, 23, 98);
text-size-adjust: 100%;
text-transform: uppercase;
width: 720px;
word-wrap: break-word;
column-rule-color: rgb(25, 23, 98);
perspective-origin: 360px 85.6px;
transform-origin: 360px 85.6px;
caret-color: rgb(25, 23, 98);
border: 0px none rgb(25, 23, 98);
font: normal normal 400 normal 14px / 25.9px Helvetica;
list-style: none outside none;
margin: 0px;
outline: rgb(25, 23, 98) none 0px;
padding: 0px;
}/*#UL_7*/
#LI_8, #LI_10 {
bottom: 0px;
box-sizing: border-box;
color: rgb(25, 23, 98);
display: inline-block;
height: 85.6px;
left: 0px;
overflow-wrap: break-word;
position: relative;
right: 0px;
text-align: center;
text-decoration: none solid rgb(25, 23, 98);
text-size-adjust: 100%;
text-transform: uppercase;
top: 0px;
width: 87px;
word-wrap: break-word;
column-rule-color: rgb(25, 23, 98);
perspective-origin: 43.5px 42.8px;
transform-origin: 43.5px 42.8px;
caret-color: rgb(25, 23, 98);
border: 0px none rgb(25, 23, 98);
font: normal normal 400 normal 14px / 25.9px Helvetica;
list-style: none outside none;
outline: rgb(25, 23, 98) none 0px;
padding: 0px 45px 0px 0px;
}/*#LI_8, #LI_10*/
#A_9, #A_11 {
box-sizing: border-box;
color: rgb(25, 23, 98);
display: block;
height: 85.6px;
overflow-wrap: break-word;
text-align: center;
text-decoration: none solid rgb(25, 23, 98);
text-size-adjust: 100%;
text-transform: uppercase;
width: 42px;
word-wrap: break-word;
column-rule-color: rgb(25, 23, 98);
perspective-origin: 21px 42.8px;
transform-origin: 21px 42.8px;
caret-color: rgb(25, 23, 98);
border: 0px none rgb(25, 23, 98);
font: normal normal 400 normal 14px / 25.9px Helvetica;
list-style: none outside none;
outline: rgb(25, 23, 98) none 0px;
padding: 30px 0px;
}/*#A_9, #A_11*/
#LI_12 {
bottom: 0px;
box-sizing: border-box;
color: rgb(25, 23, 98);
display: inline-block;
height: 85.6px;
left: 0px;
overflow-wrap: break-word;
position: relative;
right: 0px;
text-align: center;
text-decoration: none solid rgb(25, 23, 98);
text-size-adjust: 100%;
text-transform: uppercase;
top: 0px;
width: 93.2375px;
word-wrap: break-word;
column-rule-color: rgb(25, 23, 98);
perspective-origin: 46.6125px 42.8px;
transform-origin: 46.6125px 42.8px;
caret-color: rgb(25, 23, 98);
border: 0px none rgb(25, 23, 98);
font: normal normal 400 normal 14px / 25.9px Helvetica;
list-style: none outside none;
outline: rgb(25, 23, 98) none 0px;
padding: 0px 45px 0px 0px;
}/*#LI_12*/
#A_13 {
box-sizing: border-box;
color: rgb(25, 23, 98);
display: block;
height: 85.6px;
overflow-wrap: break-word;
text-align: center;
text-decoration: none solid rgb(25, 23, 98);
text-size-adjust: 100%;
text-transform: uppercase;
width: 48.2375px;
word-wrap: break-word;
column-rule-color: rgb(25, 23, 98);
perspective-origin: 24.1125px 42.8px;
transform-origin: 24.1125px 42.8px;
caret-color: rgb(25, 23, 98);
border: 0px none rgb(25, 23, 98);
font: normal normal 400 normal 14px / 25.9px Helvetica;
list-style: none outside none;
outline: rgb(25, 23, 98) none 0px;
padding: 30px 0px;
}/*#A_13*/
#LI_14 {
bottom: 0px;
box-sizing: border-box;
color: rgb(25, 23, 98);
display: inline-block;
height: 85.6px;
left: 0px;
overflow-wrap: break-word;
position: relative;
right: 0px;
text-align: center;
text-decoration: none solid rgb(25, 23, 98);
text-size-adjust: 100%;
text-transform: uppercase;
top: 0px;
width: 111.625px;
word-wrap: break-word;
column-rule-color: rgb(25, 23, 98);
perspective-origin: 55.8125px 42.8px;
transform-origin: 55.8125px 42.8px;
caret-color: rgb(25, 23, 98);
border: 0px none rgb(25, 23, 98);
font: normal normal 400 normal 14px / 25.9px Helvetica;
list-style: none outside none;
outline: rgb(25, 23, 98) none 0px;
padding: 0px 45px 0px 0px;
}/*#LI_14*/
#A_15 {
box-sizing: border-box;
color: rgb(25, 23, 98);
display: block;
height: 85.6px;
overflow-wrap: break-word;
text-align: center;
text-decoration: none solid rgb(25, 23, 98);
text-size-adjust: 100%;
text-transform: uppercase;
width: 66.625px;
word-wrap: break-word;
column-rule-color: rgb(25, 23, 98);
perspective-origin: 33.3125px 42.8px;
transform-origin: 33.3125px 42.8px;
caret-color: rgb(25, 23, 98);
border: 0px none rgb(25, 23, 98);
font: normal normal 400 normal 14px / 25.9px Helvetica;
list-style: none outside none;
outline: rgb(25, 23, 98) none 0px;
padding: 30px 0px;
}/*#A_15*/
#LI_16 {
bottom: 0px;
box-sizing: border-box;
color: rgb(25, 23, 98);
display: inline-block;
height: 85.6px;
left: 0px;
overflow-wrap: break-word;
position: relative;
right: 0px;
text-align: center;
text-decoration: none solid rgb(25, 23, 98);
text-size-adjust: 100%;
text-transform: uppercase;
top: 0px;
width: 78.45px;
word-wrap: break-word;
column-rule-color: rgb(25, 23, 98);
perspective-origin: 39.225px 42.8px;
transform-origin: 39.225px 42.8px;
caret-color: rgb(25, 23, 98);
border: 0px none rgb(25, 23, 98);
font: normal normal 400 normal 14px / 25.9px Helvetica;
list-style: none outside none;
outline: rgb(25, 23, 98) none 0px;
padding: 0px 45px 0px 0px;
}/*#LI_16*/
#A_17 {
box-sizing: border-box;
color: rgb(25, 23, 98);
display: block;
height: 85.6px;
overflow-wrap: break-word;
text-align: center;
text-decoration: none solid rgb(25, 23, 98);
text-size-adjust: 100%;
text-transform: uppercase;
width: 33.45px;
word-wrap: break-word;
column-rule-color: rgb(25, 23, 98);
perspective-origin: 16.725px 42.8px;
transform-origin: 16.725px 42.8px;
caret-color: rgb(25, 23, 98);
border: 0px none rgb(25, 23, 98);
font: normal normal 400 normal 14px / 25.9px Helvetica;
list-style: none outside none;
outline: rgb(25, 23, 98) none 0px;
padding: 30px 0px;
}/*#A_17*/
#LI_18 {
bottom: 0px;
box-sizing: border-box;
color: rgb(25, 23, 98);
display: inline-block;
height: 85.6px;
left: 0px;
overflow-wrap: break-word;
position: relative;
right: 0px;
text-align: center;
text-decoration: none solid rgb(25, 23, 98);
text-size-adjust: 100%;
text-transform: uppercase;
top: 0px;
width: 131.087px;
word-wrap: break-word;
column-rule-color: rgb(25, 23, 98);
perspective-origin: 65.5375px 42.8px;
transform-origin: 65.5375px 42.8px;
caret-color: rgb(25, 23, 98);
border: 0px none rgb(25, 23, 98);
font: normal normal 400 normal 14px / 25.9px Helvetica;
list-style: none outside none;
outline: rgb(25, 23, 98) none 0px;
padding: 0px 45px 0px 0px;
}/*#LI_18*/
#A_19 {
box-sizing: border-box;
color: rgb(25, 23, 98);
display: block;
height: 85.6px;
overflow-wrap: break-word;
text-align: center;
text-decoration: none solid rgb(25, 23, 98);
text-size-adjust: 100%;
text-transform: uppercase;
width: 86.0875px;
word-wrap: break-word;
column-rule-color: rgb(25, 23, 98);
perspective-origin: 43.0375px 42.8px;
transform-origin: 43.0375px 42.8px;
caret-color: rgb(25, 23, 98);
border: 0px none rgb(25, 23, 98);
font: normal normal 400 normal 14px / 25.9px Helvetica;
list-style: none outside none;
outline: rgb(25, 23, 98) none 0px;
padding: 30px 0px;
}/*#A_19*/
#LI_20 {
bottom: 0px;
box-sizing: border-box;
color: rgb(25, 23, 98);
display: inline-block;
height: 85.6px;
left: 0px;
overflow-wrap: break-word;
position: relative;
right: 0px;
text-align: center;
text-decoration: none solid rgb(25, 23, 98);
text-size-adjust: 100%;
text-transform: uppercase;
top: 0px;
width: 80.7875px;
word-wrap: break-word;
column-rule-color: rgb(25, 23, 98);
perspective-origin: 40.3875px 42.8px;
transform-origin: 40.3875px 42.8px;
caret-color: rgb(25, 23, 98);
border: 0px none rgb(25, 23, 98);
font: normal normal 400 normal 14px / 25.9px Helvetica;
list-style: none outside none;
outline: rgb(25, 23, 98) none 0px;
padding: 0px 45px 0px 0px;
}/*#LI_20*/
#A_21 {
box-sizing: border-box;
color: rgb(25, 23, 98);
display: block;
height: 85.6px;
overflow-wrap: break-word;
text-align: center;
text-decoration: none solid rgb(25, 23, 98);
text-size-adjust: 100%;
text-transform: uppercase;
width: 35.7875px;
word-wrap: break-word;
column-rule-color: rgb(25, 23, 98);
perspective-origin: 17.8875px 42.8px;
transform-origin: 17.8875px 42.8px;
caret-color: rgb(25, 23, 98);
border: 0px none rgb(25, 23, 98);
font: normal normal 400 normal 14px / 25.9px Helvetica;
list-style: none outside none;
outline: rgb(25, 23, 98) none 0px;
padding: 30px 0px;
}/*#A_21*/
#LI_22 {
bottom: 0px;
box-sizing: border-box;
color: rgb(25, 23, 98);
display: inline-block;
height: 85.6px;
left: 0px;
overflow-wrap: break-word;
position: relative;
right: 0px;
text-align: center;
text-decoration: none solid rgb(25, 23, 98);
text-size-adjust: 100%;
text-transform: uppercase;
top: 0px;
width: 32.6625px;
word-wrap: break-word;
column-rule-color: rgb(25, 23, 98);
perspective-origin: 16.325px 42.8px;
transform-origin: 16.325px 42.8px;
caret-color: rgb(25, 23, 98);
border: 0px none rgb(25, 23, 98);
font: normal normal 400 normal 14px / 25.9px Helvetica;
list-style: none outside none;
outline: rgb(25, 23, 98) none 0px;
}/*#LI_22*/
#A_23 {
box-sizing: border-box;
color: rgb(25, 23, 98);
display: block;
height: 85.6px;
overflow-wrap: break-word;
text-align: center;
text-decoration: none solid rgb(25, 23, 98);
text-size-adjust: 100%;
text-transform: uppercase;
width: 32.6625px;
word-wrap: break-word;
column-rule-color: rgb(25, 23, 98);
perspective-origin: 16.325px 42.8px;
transform-origin: 16.325px 42.8px;
caret-color: rgb(25, 23, 98);
border: 0px none rgb(25, 23, 98);
font: normal normal 400 normal 14px / 25.9px Helvetica;
list-style: none outside none;
outline: rgb(25, 23, 98) none 0px;
padding: 30px 0px;
}/*#A_23*/
如果你需要短划线Sub F()
Dim re As New RegExp
Dim sMatch As String
Dim document As String
re.Pattern = "-\n((.|\n)+?)\n-"
'Getting document
document = ...
sMatch = re.Execute(document)(0).SubMatches(0)
End Sub
,那么只需将它们包含在捕获组(外括号)中。
答案 2 :(得分:0)
此RegEx符合您的描述并成功提取段落(在regex101.com上测试):
matchPara = "-\n\d+\.\s*((?:.|\n)+?)\s*\n-"
它需要全球化的&#39;标志但不是多线&#39;旗。相反,行尾标记在正则表达式中匹配。重点是最里面的匹配组将匹配任何字符,包括行尾(作为替代),但是以非贪婪的方式(&#34; +?&#34;)这样做。它并不关心字边界,因为这不是必需的。此外,&#34; - &#34;不是在正则表达式中使用的特殊字符,因此它不必被转义。
由于额外的好处导致和尾随空格被切断(&#34; \ s *&#34;在组外)。