如何使用php正则表达式区分<div>与<! - ?php? - >?

时间:2017-02-21 12:56:10

标签: php regex lookahead

如何使用php正则表达式区分<div><?php ?>div只是一个例子。我需要在<smth> <?php code ?> </smth>区分smth?php code ?,其中smth和代码可以是任何字符组合。

我想获得<div>的内容,而不是<?php ? >的内容,有时与

相比
$regex1 = '#<(?<!\?)(.*?)>#' ;  // result : div , ?php ? , /div  . 

在这种情况下我不需要php。

$regex1 = '#<(?<!\??)(.*?)>#' ;  //Compilation failed: lookbehind assertion is not fixed length at offset 8

第二个问题是如何检索inside <?php ?>内的<div>和类似的html标记

$htmlStr = " before <div> inside <?php ?> </div> after ";
        $regex1 = '#(.*)'  //before 
                . '<(?!\?)' // < not followed by ?
                . '(.*)' // div
                . '((?<!\?)>)'// > not preceeded by ?
                . '(.*)' // Retrieves only 'inside', instead of 'inside  <?php ? >'
                . '</' //  </
                . '.*'  // div
                . '((?<!\?)>)'   // > not preceeeded by ?
                . '(.*)#'; // after 

我也尝试过非贪婪的表达方式:

$regex1 = '#(.*)'  //before 
        . '<(?!\?)' // < not followed by ?
        . '(.*)' // div
        . '((?<!\?)>)'// > not preceeded by ?
        . '(.*?)' // // non-greedy expression fetch only 'inside', but i need 'inside  <?php ? >'
        . '</' //  </
        . '.*'  // div
        . '((?<!\?)>)'   // > not preceeeded by ?
        . '(.*)#'; // after 

最终调整。如果包含DOM,DOM也不会重新获得价值。第foreach ($els as $el) { echo '<br><br>element value ='. $el->nodeValue; }行以下的代码回应inside而不是inside <?php ?>

$htmlStr = " before <div> inside <?php ?> </div> after ";      
$regex1 = '#(.*)<([a-zA-Z]+)>'// > not preceeded by ?  [a-zA-Z]
        . '(.*)' // greedy expression fetch only 'inside', but i need 'inside  <?php ? >'  (?=<\/)' 
        . '</' //  </
        . '[a-zA-Z]+'  // div
        . '(?<!\?)>'   // > not preceeeded by ?
        . '(.*)#'; // after  */
preg_match_all($regex1, $htmlStr, $attrArr1, 0); //input
$attrArr1 = array_filter($attrArr1);
print_r('<br><br> 619 htmlStr=' . $htmlStr. ',   attrArr1 = <pre>'); print_r($attrArr1); 

$dom = new \DOMDocument('1.0'); // name, value 
$dom->loadHTML($htmlStr); 
$ansArr['elType'] = $attrArr1[2][0];
//$els = $dom->getElementsByTagName('*'); // To be done 
$els = $dom->getElementsByTagName($ansArr['elType'] );
foreach ($els as $el) { echo '<br><br>element value ='. $el->nodeValue; } //gives value 'inside'
print_r('<br><br>620 elType='.$ansArr['elType'].',   els='); print_r($els); 

2 个答案:

答案 0 :(得分:1)

  

回答你的第一个问题:

('获取&lt; smth&gt;的内容,但不是&lt;?php?&gt;'的内容) demo

input  >> <smth id="test">Hello World!</smth><?php echo "Hello World!"?>
regex  >> (?<=<smth\s)(.*?)(?=>)
output >> id="test"
  

回答你的第二个问题:

('检索内部&lt;?php?&gt;') demo

input  >> <smth id="test">Hello World!</smth><?php echo "Hello World!"?>
regex  >> (?<=<\?php\s)(.*?)(?=\?>)
output >> echo "Hello World!"

('检索内部&lt; smth&gt;以及类似的html标记') demo

input  >> <smth id="test">Hello World!</smth><?php echo "Hello World!"?>
regex  >> (?<=<smth[\s]id="test">).*?(?=<\/smth>)  // not efficient
output >> Hello World!

希望这些有帮助!

答案 1 :(得分:1)

似乎正则表达式有效,只是它不打印inside,因为它不是webrowser内容中的字符串。然而,如果你之后添加一些文字,你将得到一切。此外,如果您将结果写入文件,您不仅会找到单词inside <?php ?> smth,还会找到 //$regex2 ='#(.*)<([a-zA-Z]+)(.*?)(?<!\?)>(.*)</[a-zA-Z]+(?<!\?)>(.*)#'; for textarea, also div/span/p/a and other elements having closing tags. //$regex3 ='#(.*)<([a-zA-Z]+)(.*?)/>(.*)#'; //for input // Textarea $htmlStr2 = " before <textarea name='<?php ?>' > inside '<?php ? >' smth. </textarea> after "; //$regex2 ='#(.*)<([a-zA-Z]+)(.*?)(?<!\?)>(.*)</[a-zA-Z]+(?<!\?)>(.*)#'; for textarea $regex2 = '#(.*)<([a-zA-Z]+)' // > not preceeded by ? textarea[a-zA-Z]+ . '(.*?)' // attributes (.*?) . '(?<!\?)>'// > not preceeded by ? . '(.*)' // fetch <?php ? > . '</' // </ . '[a-zA-Z]+' // div . '(?<!\?)>' // > not preceeeded by ? . '(.*)#'; // after preg_match_all($regex2, $htmlStr2, $attrArr, 0); //input $attrArr = array_filter($attrArr); //array_filter removes empty values print_r('<br><br> 619 htmlStr=' . $htmlStr. ', attrArr1 = <pre>'); print_r($attrArr); $resStr = print_r($attrArr, true); print_r('<br><br> resStr='.$resStr); file_put_contents('C:\\Users\\gintare\\Documents\\reg2.txt', $resStr); $before = $attrArr1[1][0]; // before $tagType = $attrArr1[2][0]; //input $tagAttrStr = $attrArr1[3][0]; //name='<?php ? >' $inside = $attrArr1[4][0]; //inside '<?php ? >' smth. $afer = $attrArr1[5][0]; //after //Input $htmlStr3 = " before <input class='inp' > after '<?php ? >' smth. "; //$regex3 ='#(.*)<([a-zA-Z]+)(.*?)/>(.*)#'; //for input $regex3 = '#(.*)<([a-zA-Z]+)' // > not preceeded by ? input[a-zA-Z]+ . '(.*?)' // attributes (.*?) . '/>'// . '(.*)'; // after preg_match_all($regex3, $htmlStr3, $attrArr, 0); //input $attrArr = array_filter($attrArr); //array_filter removes empty values print_r('<br><br> 619 htmlStr=' . $htmlStr. ', attrArr1 = <pre>'); print_r($attrArr); $resStr = print_r($attrArr, true); print_r('<br><br> resStr='.$resStr); file_put_contents('C:\\Users\\gintare\\Documents\\reg3.txt', $resStr); $before = $attrArr1[1][0]; // before $tagType = $attrArr1[2][0]; //input $tagAttrStr = $attrArr1[3][0]; //name='<?php ? >' $after = $attrArr1[4][0]; //after

b_json['b' + i]