Question

我试图编写一个在代码编辑器（Khan Live Editor）中找到一串HTML标签的正则表达式，并给出以下错误：

"You can't put <h1.. 2.. 3..> inside <p> elements."

这是我试图匹配的字符串：

<p> ... <h1>

这是我不想匹配的字符串：

<p> ... </p><h1>

相反，预期的行为是在这种情况下出现另一条错误消息。

所以在英语中我想要一个字符串;
- 以<p>和
开头 - 以<h1>结尾但结束 - 不包含</p>。

如果我不关心</p>的存在，那么这项工作很容易。我的表达式看起来像/<p>.*<h[1-6]>/，它运行正常。但我需要确保</p>不在<p>和<h1>标记（或任何<h#>标记之间，因此<h[1-6]>）。

我在这里尝试了很多其他帖子的不同表达方式：

Regular expression to match a line that doesn't contain a word?

我尝试过：<p>^((?!<\/p>).)*$</h1>

regex string does not contain substring

我尝试过：/^<p>(?!<\/p>)<h1>$/

Regular expression that doesn't contain certain string

此链接建议：aa([^a] | a[^a])aa

在我的案例中哪些不起作用，因为我需要特定字符串＆＃34; </p>＆＃34;不只是它的字符，因为<p> ... <h1>之间可能还有其他标签。

我真的很难过。我试过的正则表达式似乎应该工作......任何想法我将如何使这项工作？也许我从其他帖子中实施错误的建议？

提前感谢您的帮助。

修改

要回答为什么我需要这样做：

问题是<p><h1></h1></p>是语法错误，因为h1关闭了第一个<p>并且有一个不匹配的</p>。原始语法错误不提供信息，但在大多数情况下它是正确的;我的例子是例外。如果正则表达式发现此异常，我试图传递语法解析器一条新消息来覆盖原始消息。

Answer 1

有时候解决问题会更好。

        ' Shows the use of a SaveFileDialog to save a MemoryStream to a file.
Private Sub Button2_Click(ByVal sender As Object, _
    ByVal e As EventArgs) Handles Button2.Click

    ' Set the properties on SaveFileDialog1 so the user is 
    ' prompted to create the file if it doesn't exist 
    ' or overwrite the file if it does exist.
    SaveFileDialog1.CreatePrompt = True
    SaveFileDialog1.OverwritePrompt = True

    ' Set the file name to myText.txt, set the type filter
    ' to text files, and set the initial directory to the 
    ' MyDocuments folder.
    SaveFileDialog1.FileName = "myText"
    ' DefaultExt is only used when "All files" is selected from 
    ' the filter box and no extension is specified by the user.
    SaveFileDialog1.DefaultExt = "txt"
    SaveFileDialog1.Filter = "Text files (*.txt)|*.txt|All files (*.*)|*.*"
    SaveFileDialog1.InitialDirectory = _
        Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments)

这段代码不能处理“如果没有var str = "YOUR INPUT HERE"; str = str.substr(str.indexOf("<p>")); str = str.substr(0,str.lastIndexOf("<h1>")); if( str.indexOf("</p>") > -1) { // there is a <p>...</p>...<h1> } else { // there isn't }开始的情况”，但它确实提供了如何将问题分解为更简单的部分的基本概念，而不使用正则表达式。

Answer 2

搜索<p>后跟任意数量的字符（[^]表示任何非任何字符，这样我们也可以捕获换行符）</p>最后是<h[1-6]>。

/<p>(?:[^](?!<\/p>))*<h[1-6]>/gi

RegEx101 Test Case

const strings = [ '<p> ... <h1>', '<p> ... </p><h1>', '<P> Hello <h1>', '<p></p><h1>',
                  '<p><h1>' ];

const regex = /<p>(?:(?!<\/p>)[^])*<h[1-6]>/gi;

const test = input => ({ input, test: regex.test(input), matches: input.match(regex) });

for(let input of strings) console.log(JSON.stringify(test(input)));

// { "input": "<p> ... <h1>",     "test": true,  "matches": ["<p> ... <h1>"]   }
// { "input": "<p> ... </p><h1>", "test": false, "matches": null               }
// { "input": "<P> Hello <h1>",   "test": true,  "matches": ["<P> Hello <h1>"] }
// { "input": "<p></p><h1>",      "test": false, "matches": null               }
// { "input": "<p><h1>",          "test": true,  "matches": ["<p><h1>"]        }

.as-console-wrapper { max-height: 100% !important; min-height: 100% !important; }

Answer 3

您的第一个正则表达式已关闭，但需要删除^和$个字符。如果您需要匹配换行符，则应使用[/s/S]代替.。

这是最终的正则表达式：<p>(?:(?!<\/p>)[\s\S])*<h[1-6]>

但是，在段落元素中使用标题标记（<h1> - <h6>）是完全合法的。它们只被视为兄弟元素，其中section元素以header元素开头的结尾。

如果 p 元素后面紧跟地址，文章 p 元素的结束标记>，搁置， blockquote ， dir ， div ， dl ， fieldset ，页脚，表单， h1 ， h2 ， h3 ， h4 ， h5 ， h6 ，标题， hr ，菜单，导航， ol ， p ，预，部分，表格或 ul 元素，或者如果父元素中没有其他内容且父元素不是 a 元素。

http://www.w3.org/TR/html-markup/p.html

Answer 4

我得出的结论是，使用正则表达式查找错误会将您的一个问题转化为两个问题。

因此，我认为更好的方法是进行一种非常简单的树解析形式。一个＆＃34;穷人的HTML解析器＆＃34;，如果你愿意的话。

使用简单的正则表达式简单地查找HTML中的所有标记，并按照查找它们的顺序将它们放入列表中。忽略标记之间的文本节点。

然后，按顺序浏览列表，在标签上保持运行记录。获得<p>标记时递增P计数器，并在获得</p>标记时递减它。当你到达<h1>（等）标签时，递增H计数器和H计数器，减少结束标记。

如果H计数器是> 0，而P计数器> 0 0，这是你的错误。

Answer 5

我知道我没有正确格式化它，但我认为逻辑可行，

（只需用正确的符号替换AND和NOT）：

/(<p>.*<h[1-6]>)AND !(<p>.*</p><h[1-6]>)/

让我知道它是怎么回事:)）

JavaScript正则表达式：查找不包含<p>的字符串</p>

5 个答案: