Question

如何检测属性之间没有空格。例如：

 <div style="margin:37px;"/></div>
 <span title=''style="margin:37px;" /></span>
 <span title="" style="margin:37px;" /></span>
 <a title="u" hghghgh  title="j" >

 <a title=""gg  ff>

正确：1,3,4 不正确：2,5 怎么检测不正确？

我已尝试过这个：

<(.*?=(['"]).*?\2)([\S].*)|(^/)>

但它没有用。

Answer 1

You should not use regex to parse HTML，除非是为了学习目的。

http://regexr.com/3cge1

<\w+(\s+[\w-]+(=(['"]?)[^"']*\3)?)*\s*/?>

即使您根本没有任何属性，此正则表达式也会匹配。它适用于自动关闭标签，如果属性没有值。

<\w+匹配开放的<和\w字符。
(\s+[\w-]+(=(['"])[^"']*\3)?)*零个或多个必须以空格开头的属性。它包含两部分：
- \s+[\w-]+必填空格后的属性名称
- (=(['"])[^"']*\3)?可选属性值
\s*/?>可选空格和可选/，然后关闭>。

这是对字符串的测试：

var re = /<\w+(\s+[\w-]+(=(['"]?)[^"']*\3)?)*\s*\/?>/g;

! '<div style="margin:37px;"/></div>'.match(re);
false

! '<span title=\'\'style="margin:37px;" /></span>'.match(re);
true

! '<span title="" style="margin:37px;" /></span>'.match(re);
false

! '<a title="u" hghghgh  title="j" >'.match(re);
false

! '<a title=""gg  ff>'.match(re);
true

显示所有不正确的标签：

var html = '<div style="margin:37px;"></div> <span title=\'\'style="margin:37px;"/><a title=""gg ff/> <span title="" style="margin:37px;" /></span> <a title="u" hghghgh title="j"example> <a title=""gg ff>';
var tagRegex = /<\w+[^>]*\/?>/g;
var validRegex = /<\w+(\s+[\w-]+(=(['"]?)[^"']*\3)?)*\s*\/?>/g;

html.match(tagRegex).forEach(function(m) {
  if(!m.match(validRegex)) {
    console.log('Incorrect', m);
  }
});

将输出

Incorrect <span title=''style="margin:37px;"/>
Incorrect <a title=""gg ff/>
Incorrect <a title="u" hghghgh title="j"example>
Incorrect <a title=""gg ff>

评论更新

<\w+(\s+[\w-]+(="[^"]*"|='[^']*'|=[\w-]+)?)*\s*/?>

Answer 2

试试这个正则表达式，我认为它会起作用

<\w*[^=]*=["'][\w;:]*["'][\s/]+[^>]*>

< - 开始括号

\w* - 一个或多个字母数字字符

[^=]*= - 它会覆盖所有角色，直到'='出现 ["'][\w;:]*["'] - 这将匹配两个案例 1.单引号，可选字符串 2.一个双引号，带有字符串可选

[\s/]+ - 匹配空格或'\'至少出现一次

[^>]* - 这将匹配所有字符，直到'＆gt;'结束括号

Answer 3

我得到了这种模式，按照你的要求找到了不正确的第2行和第5行：

>>> import re
>>> p = r'<[a-z]+\s[a-z]+=[\'\"][\w;:]*[\"\'][\w]+.*'

>>> html = """
 <div style="margin:37px;"/></div>
 <span title=''style="margin:37px;" /></span>
 <span title="" style="margin:37px;" /></span>
 <a title="u" hghghgh  title="j" >

 <a title=""gg  ff>
"""

>>> bad = re.findall(p, html)
>>> print '\n'.join(bad)
<span title=''style="margin:37px;" /></span>
<a title=""gg  ff>

正则表达式细分：

p = r'<[a-z]+\s[a-z]+=[\'\"][\w;:]*[\"\'][\w]+.*'

< - 开始括号

[a-z]+\s - 一个或多个小写字母后跟一个空格

[a-z]+= - 一个或多个小写字母后跟一个等号

[\'\"] - 一次匹配单引号或双引号

[\w;:]* - 匹配一个字母数字字符（a-zA-Z0-9_）或冒号或分号0次或以上

[\"\'] - 再次匹配单引号或双引号

[\w]+ - 匹配一个字母数字字符一次或多次（这会捕获您想要检测的空间不足）***

.* - 匹配任何0次或更多次（获得其余部分）

Answer 4

对此不确定我对正则表达式没有那么有经验，但看起来它运作良好

JS Fiddle

<([a-z]+)(\s+[a-z\-]+(="[^"]*")?)*\s*\/?>([^<]+(<\/$1>))?

目前<([a-z]+)大部分都有效，但使用网络组件和<ng-*这最好是\w+

---------------

输出：

<div style="margin:37px;">div</div> correct

<span title=" style="margin:37px;" />span1</span> incorrect

<span title="" style="margin:37px;" />span2</span> correct

<a title="u" title="j">link</a> correct

<a title=""href="" alt="" required>test</a> incorrect

<img src="" data-abc="" required> correct

<input type=""style="" /> incorrect

正则表达式在属性html之间没有空格

4 个答案:

http://regexr.com/3cge1

显示所有不正确的标签：

评论更新

---------------