Question

我需要在字符串中获取第一个p标记的内容（但没有实际的标记）。

实施例

<h1>I don't want the title</h1>
<p>This is the text I want</p>
<p>I don't want this</p>
<p>I also don't want this</p>

我想我需要找到其他所有内容并用什么来代替它？但是我如何创建正则表达式呢？

Answer 1

尝试这样的事情：

Set fso  = CreateObject("Scripting.FileSystemObject")
Set html = CreateObject("HTMLFile")
html.write fso.OpenTextFile("C:\path\to\your.html").ReadAll
Set p = html.getElementsByTagName("p")
WScript.Echo p(0).innerText

Answer 2

使用此模式捕获您想要的内容

^[\s\S]*?<p>([^<>]*?)<\/p>

Demo

^               # Start of string/line
[\s\S]          # Character Class [\s\S]
*?              # (zero or more)(lazy)
<p>             # "<p>"
(               # Capturing Group (1)
  [^<>]         # Character not in [^<>]
  *?            # (zero or more)(lazy)
)               # End of Capturing Group (1)
<\/p>           # "<\/p>"

或使用此模式匹配其他所有内容并替换为任何内容

^[\s\S]*?<p>|<\/p>[\s\S]*$

Demo

^               # Start of string/line
[\s\S]          # Character Class [\s\S]
*?              # (zero or more)(lazy)
<p>             # "<p>"
|               # OR
<               # "<"
\/              # "/"
p>              # "p>"
[\s\S]          # Character Class [\s\S]
*               # (zero or more)(greedy)
$               # End of string/line

Answer 3

您可以使用xpath表达式正确执行此操作：

//p[1]/text()

改编自Navigating XML nodes in VBScript, for a Dummy：

Set objDoc = CreateObject("MSXML.DOMDocument")
objDoc.Load "C:\Temp\Test.xml"

' Find a particular element using XPath:

Set objNode = objDoc.selectSingleNode("//p[1]/text()")
MsgBox objNode.getAttribute("value")

正则表达式删除第一个<p>标记</p>之前和之后的所有内容

3 个答案: