鉴于此DOM
InputFile inputFile = (InputFile) ctx.getBean("inputFile");
我正在尝试使用此表达式preg_match_all html标记:
$html=<<<'EOD'
<div class='container clickable' data-param='{"footer":"<div>Bye</div>","info":"We win"}'>
<img src='a.jpg' />
</div>
<a href='a.html'>The A</a>
<span></span>
<span data-span-param='{"detailTag":"<span class=\"link\">Anything here</span>"}'>
<a></a>
</span>
EOD;
此表达式的输出为:
阵列
(
[0] =&gt; &LT; div class =&#39;容器可点击&#39;数据PARAM =&#39; {&#34;页脚&#34;:&#34;&LT; DIV&GT;
[1] =&gt; &LT; / DIV&GT;
[2] =&gt; &LT; img src =&#39; a.jpg&#39; /&GT;
[3] =&gt; &LT; / DIV&GT;
[4] =&gt; &LT; a href =&#39; a.html&#39;&gt;
[5] =&gt; &LT; / A&GT;
[6] =&gt; &LT;跨度&GT;
[7] =&gt; &LT; /跨度&GT;
[8] =&gt; &LT; span data-span-param =&#39; {&#34; detailTag&#34;:&#34;&lt; span class = \&#34; link \&#34;&gt;
[9] =&gt; &LT; /跨度&GT;
[10] =&gt; &LT;一个&GT;
[11] =&gt; &LT; / A&GT;
[12] =&gt; &LT; /跨度&GT;
)
我的预期输出是:
阵列
(
[0] =&gt; &LT; div class =&#39;容器可点击&#39;数据PARAM =&#39; {&#34;页脚&#34;:&#34;&LT; DIV&GT;再见&LT; / div&gt;&#34;,&#34; info&#34;:&#34;我们赢了&#34;}&#39;&gt;
[1] =&gt; &LT; img src =&#39; a.jpg&#39; /&GT;
[2] =&gt; &LT; / DIV&GT;
[3] =&gt; &LT; a href =&#39; a.html&#39;&gt;
[4] =&gt; &LT; / A&GT;
[5] =&gt; &LT;跨度&GT;
[6] =&gt; &LT; /跨度&GT;
[7] =&gt; &LT; span data-span-param =&#39; {&#34; detailTag&#34;:&#34;&lt; span class = \&#34; link \&#34;&gt;此处的任何内容&lt; /跨度&GT;&#34;}&#39;&GT;
[8] =&gt; &LT;一个&GT;
[9] =&gt; &LT; / A&GT;
[10] =&gt; &LT; /跨度&GT;
)
我需要一个表达式的帮助才能解决这个问题。
答案 0 :(得分:1)
这将匹配所有html标签,不会捕获用双引号或单引号括起来的标签
<?php
$html=<<<EOD
<div class='container clickable' data-param='{"footer"<div>Bye</div>","info":"We win"}'>
<img src='a.jpg' />
</div>
<a href='a.html'>The A</a>
<span></span>
<span data-span-param='{"detailTag":"<span class=\"link\">Anything here</span>"}'>
<a></a>
</span>
EOD;
$html = preg_replace('~\<\;~is','<',$html);
$html = preg_replace('~\>\;~is','>',$html);
//$html = preg_replace('~\"\;~is','"',$html);
$html = preg_replace('~=\s*\'\s*\'~is','=\'.\'',$html);
$html = preg_replace('~=\s*"\s*"~is','="."',$html);
if(preg_match_all('~((?<==\')(?:.(?!\'))*.)\'|((?<==")(?:.(?!"))*.)"~im',$html,$matchall,PREG_SET_ORDER)){
foreach($matchall as $m){
if(preg_match('~\<~is',$m[0],$mtch1)||preg_match('~\>~is',$m[0],$mtch2)){
$end = $m[0][(strlen($m[0])-1)];
$replace1 = substr($m[0],0,(strlen($m[0])-1));
$replace = preg_replace('~"~is','"',$replace1);
$replace = preg_replace('~<~is','<',$replace);
$replace = preg_replace('~>~is','>',$replace);
$html = preg_replace("~".preg_quote(($end.$replace1.$end),'~')."~is",$end.$replace.$end,$html);
}
}
}
$tags = array();
if(preg_match_all('~<\s*[\w]+[^>]*>|<\s*/\s*[\w]+\s*>~im',$html,$matchall,PREG_SET_ORDER)){
foreach($matchall as $m){
$tags[] = $m[0];
}
}
print_r($tags);
?>
输出:
Array
(
[0] => <div class='container clickable' data-param='{"footer":"<div>Bye</div>","info":"We win"}'>
[1] => <img src='a.jpg' />
[2] => </div>
[3] => <a href='a.html'>
[4] => </a>
[5] => <span>
[6] => </span>
[7] => <span data-span-param='{"detailTag":"<span class=\"link\">Anything here</span>"}'>
[8] => <a>
[9] => </a>
[10] => </span>
)
答案 1 :(得分:1)
答案 2 :(得分:0)