我需要将php中的内容拆分为(json-)数组。 即我想摆脱这个:
<p>Text Level 0</p>
<section class="box_1">
<header class="trigger"><h2>Title</h2></header>
<div class="content">
<div class="box_2">
<div class="class"></div>
<div class="content">
<p>Text Level 2</p>
<p>More Text Level 2</p>
</div>
</div>
<div class="box_2">
<div class="class"></div>
<div class="content">
<p>Text Level 2</p>
<div class="box_3">
<div class="content">
<p>Text Level 3</p>
</div>
</div>
</div>
</div>
</div>
</section>
<p>Another Text</p>
结果:
0: "Text Level 0"; 2: "Text Level 2\nMore Text Level 2"; 2: "Text Level 2"; 3: "Text Level 3"; 0: "Another Text";
这意味着我需要Text的“Level”和Text本身。 但我不知道该怎么做。我应该使用RegExp还是应该解析内容(即simple_html_dom.php)?
类似的东西:
但我怎么能在php中做到这一点?
答案 0 :(得分:1)
正则表达式
[\w\s\d]+(?=\<\/p)
$re = "/[\w\s\d]+(?=\<\/p)/";
$str = "<p>Text Level 0</p>"; //Sample from Your large string
preg_match_all($re, $str, $matches);
OP在JS中不需要这个,但我希望有人可以通过将其转换为php来帮助他。我对php不太熟练。
var domString = '<p>Text Level 0</p><section class="box_1"><div class="content"><div class="box_2"><div class="class"></div><div class="content"><p>Text Level 2</p><p>More Text Level 2</p></div></div><div class="box_2"><div class="class"></div><div class="content"><p>Text Level 2</p><div class="box_3"><div class="content"><p>Text Level 3</p></div></div></div></div></div></section><p>Another Text</p>'
var result = domString.match(/[\w\s\d]+(?=\<\/p)/g)
var parentTagSubString = function(str,startTagStr,endTagStr,refSearchStr) {
posRefSearchStr = str.indexOf(refSearchStr);
var posStartParentTag = str.lastIndexOf(startTagStr, posRefSearchStr)
var posEndParentTag = str.indexOf(endTagStr, posRefSearchStr)
return str.substring(posStartParentTag,posEndParentTag + endTagStr.length)
}
//explanation parentTagSubString function
// given a string - "refSearchStr"
// Search towards its left for "startTagStr"
// and
// search towards right for "endTagStr"
// within the string - "str"
for(var i=0;i<result.length;i++) {
var found = parentTagSubString(domString, "box_", "<p>", result[i])
//If p-element is not in "content" -> Level 0
//as mentioned by OP
if((found.indexOf(result[i]) == 3) || (found.indexOf(result[i]) == -1)) {
console.log('level is 0 : ', result[i])
} else {
//we searched backward till Box and if box found
//it must be at starting point
if(found.indexOf("box_") == 0) {
//search for immediate number after "box_"
console.log("Level is: ", found.match(/[\d]+/).join(''), " ", result[i])
}
}
}
//Sample Output
//level is 0 : Text Level 0
//Level is: 2 Text Level 2
//Level is: 2 More Text Level 2
//Level is: 2 Text Level 2
//Level is: 3 Text Level 3
//level is 0 : Another Text
答案 1 :(得分:1)
很多人不信任用正则表达式解析html - 在大多数情况下都有充分的理由。首选解决方案是DOM解析器。话虽这么说,如果你想用正则表达式处理这个特定的输入,这是完全可能的。以下是其中几种方法之一:
(?s)<p>\K.*?(?=</p>)
示例PHP代码
(参见online demo底部的输出):
$regex = '~(?s)<p>\K.*?(?=</p>)~';
preg_match_all($regex, $yourstring, $matches);
print_r($matches[0]);
$m[0]
是匹配数组(参见输出)。然后,您可以将其转换为您喜欢的任何其他格式。
<强>输出:强>
[0] => Text Level 0
[1] => Text Level 2
[2] => More Text Level 2
[3] => Text Level 2
[4] => Text Level 3
[5] => Another Text
<强>解释强>
<p>
匹配开始标记\K
告诉引擎放弃与最终匹配的内容.*?
懒惰地匹配任何字符(这是匹配)直到...... (?=</p>)
可以断言后面的内容是结束标记。<强>参考强>