如何使用php正则表达式区分<div>
和<?php ?>
?
div
只是一个例子。我需要在<smth> <?php code ?> </smth>
区分smth
和?php code ?
,其中smth和代码可以是任何字符组合。
我想获得<div>
的内容,而不是<?php ? >
的内容,有时与
$regex1 = '#<(?<!\?)(.*?)>#' ; // result : div , ?php ? , /div .
在这种情况下我不需要php。
$regex1 = '#<(?<!\??)(.*?)>#' ; //Compilation failed: lookbehind assertion is not fixed length at offset 8
第二个问题是如何检索inside <?php ?>
内的<div>
和类似的html标记
$htmlStr = " before <div> inside <?php ?> </div> after ";
$regex1 = '#(.*)' //before
. '<(?!\?)' // < not followed by ?
. '(.*)' // div
. '((?<!\?)>)'// > not preceeded by ?
. '(.*)' // Retrieves only 'inside', instead of 'inside <?php ? >'
. '</' // </
. '.*' // div
. '((?<!\?)>)' // > not preceeeded by ?
. '(.*)#'; // after
我也尝试过非贪婪的表达方式:
$regex1 = '#(.*)' //before
. '<(?!\?)' // < not followed by ?
. '(.*)' // div
. '((?<!\?)>)'// > not preceeded by ?
. '(.*?)' // // non-greedy expression fetch only 'inside', but i need 'inside <?php ? >'
. '</' // </
. '.*' // div
. '((?<!\?)>)' // > not preceeeded by ?
. '(.*)#'; // after
最终调整。如果包含DOM,DOM也不会重新获得价值。第foreach ($els as $el) { echo '<br><br>element value ='. $el->nodeValue; }
行以下的代码回应inside
而不是inside <?php ?>
$htmlStr = " before <div> inside <?php ?> </div> after ";
$regex1 = '#(.*)<([a-zA-Z]+)>'// > not preceeded by ? [a-zA-Z]
. '(.*)' // greedy expression fetch only 'inside', but i need 'inside <?php ? >' (?=<\/)'
. '</' // </
. '[a-zA-Z]+' // div
. '(?<!\?)>' // > not preceeeded by ?
. '(.*)#'; // after */
preg_match_all($regex1, $htmlStr, $attrArr1, 0); //input
$attrArr1 = array_filter($attrArr1);
print_r('<br><br> 619 htmlStr=' . $htmlStr. ', attrArr1 = <pre>'); print_r($attrArr1);
$dom = new \DOMDocument('1.0'); // name, value
$dom->loadHTML($htmlStr);
$ansArr['elType'] = $attrArr1[2][0];
//$els = $dom->getElementsByTagName('*'); // To be done
$els = $dom->getElementsByTagName($ansArr['elType'] );
foreach ($els as $el) { echo '<br><br>element value ='. $el->nodeValue; } //gives value 'inside'
print_r('<br><br>620 elType='.$ansArr['elType'].', els='); print_r($els);
答案 0 :(得分:1)
回答你的第一个问题:
('获取&lt; smth&gt;的内容,但不是&lt;?php?&gt;'的内容) demo
input >> <smth id="test">Hello World!</smth><?php echo "Hello World!"?>
regex >> (?<=<smth\s)(.*?)(?=>)
output >> id="test"
回答你的第二个问题:
('检索内部&lt;?php?&gt;') demo
input >> <smth id="test">Hello World!</smth><?php echo "Hello World!"?>
regex >> (?<=<\?php\s)(.*?)(?=\?>)
output >> echo "Hello World!"
('检索内部&lt; smth&gt;以及类似的html标记') demo
input >> <smth id="test">Hello World!</smth><?php echo "Hello World!"?>
regex >> (?<=<smth[\s]id="test">).*?(?=<\/smth>) // not efficient
output >> Hello World!
希望这些有帮助!
答案 1 :(得分:1)
似乎正则表达式有效,只是它不打印inside
,因为它不是webrowser内容中的字符串。然而,如果你之后添加一些文字,你将得到一切。此外,如果您将结果写入文件,您不仅会找到单词inside <?php ?> smth
,还会找到 //$regex2 ='#(.*)<([a-zA-Z]+)(.*?)(?<!\?)>(.*)</[a-zA-Z]+(?<!\?)>(.*)#'; for textarea, also div/span/p/a and other elements having closing tags.
//$regex3 ='#(.*)<([a-zA-Z]+)(.*?)/>(.*)#'; //for input
// Textarea
$htmlStr2 = " before <textarea name='<?php ?>' > inside '<?php ? >' smth. </textarea> after ";
//$regex2 ='#(.*)<([a-zA-Z]+)(.*?)(?<!\?)>(.*)</[a-zA-Z]+(?<!\?)>(.*)#'; for textarea
$regex2 = '#(.*)<([a-zA-Z]+)' // > not preceeded by ? textarea[a-zA-Z]+
. '(.*?)' // attributes (.*?)
. '(?<!\?)>'// > not preceeded by ?
. '(.*)' // fetch <?php ? >
. '</' // </
. '[a-zA-Z]+' // div
. '(?<!\?)>' // > not preceeeded by ?
. '(.*)#'; // after
preg_match_all($regex2, $htmlStr2, $attrArr, 0); //input
$attrArr = array_filter($attrArr); //array_filter removes empty values
print_r('<br><br> 619 htmlStr=' . $htmlStr. ', attrArr1 = <pre>'); print_r($attrArr);
$resStr = print_r($attrArr, true);
print_r('<br><br> resStr='.$resStr);
file_put_contents('C:\\Users\\gintare\\Documents\\reg2.txt', $resStr);
$before = $attrArr1[1][0]; // before
$tagType = $attrArr1[2][0]; //input
$tagAttrStr = $attrArr1[3][0]; //name='<?php ? >'
$inside = $attrArr1[4][0]; //inside '<?php ? >' smth.
$afer = $attrArr1[5][0]; //after
//Input
$htmlStr3 = " before <input class='inp' > after '<?php ? >' smth. ";
//$regex3 ='#(.*)<([a-zA-Z]+)(.*?)/>(.*)#'; //for input
$regex3 = '#(.*)<([a-zA-Z]+)' // > not preceeded by ? input[a-zA-Z]+
. '(.*?)' // attributes (.*?)
. '/>'//
. '(.*)'; // after
preg_match_all($regex3, $htmlStr3, $attrArr, 0); //input
$attrArr = array_filter($attrArr); //array_filter removes empty values
print_r('<br><br> 619 htmlStr=' . $htmlStr. ', attrArr1 = <pre>'); print_r($attrArr);
$resStr = print_r($attrArr, true);
print_r('<br><br> resStr='.$resStr);
file_put_contents('C:\\Users\\gintare\\Documents\\reg3.txt', $resStr);
$before = $attrArr1[1][0]; // before
$tagType = $attrArr1[2][0]; //input
$tagAttrStr = $attrArr1[3][0]; //name='<?php ? >'
$after = $attrArr1[4][0]; //after
b_json['b' + i]