Question

我有一个html文件，我希望使用PHP从数组中获取此文件中的所有类。例如，这是我的html文件：

<div class="main menu">element</div>
<div class="content"></div>

我想获得一个包含三个元素的数组（在这个特定的例子中）： “主要”，“菜单”，“内容”。

在bash中，可以使用grep来完成此任务：

classes=($(grep -oP '(?<=class=").*?(?=")' "./index.html"))

我如何在PHP中执行相同操作？

此刻我有这个基本代码：

//read the entire string
$str = implode("", file('./index.html'));
$fp = fopen('./index.html', 'w');
//Here I guess should be the function to get all of the strings
//now, save the file
fwrite($fp, $str, strlen($str));

编辑：如果我问如何使用PHP查找字符串，我的问题怎么能是所提供的副本？这不是bash，我已经提供了grep替代方案。

Answer 1

我会像这样使用php的DOMDocument()类：

$classes = array();
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTMLFile('./index.html');
$elements = $dom->getElementsByTagName('*');
foreach($elements as $element) {
    $classes = array_merge($classes,array_filter(explode(' ',$element->getAttribute('class'))));
}
print_r($classes);

<强>解释

声明空数组$classes
关闭错误DOMDocument可能会抛出，如果它不完整或无效的html
实例化新的DOMDocument对象
将文件index.html加载到DOMDocument
使用通配符标记名
迭代元素
get classname
按空格分解classname
过滤爆炸数组以删除空值
将结果添加到$classes数组

Answer 2

要获取这三个元素，请尝试使用like this函数的正则表达式preg_match_all：

(?:class="|\G(?!^))\s*\K[^\s"]+

\G在上一场比赛结束时继续或开始
\K重置报告的匹配

请参阅test at eval.in

if(preg_match_all('/(?:class="|\G(?!^))\s*\K[^\s"]+/', $str, $out) > 0)
  print_r($out[0]);

阵（ [0] =＆gt;主要 [1] =＆gt;菜单 [2] =＆gt;内容）

请注意，通常正则表达式不适合解析html。取决于解析自己的或任意的HTML以及将要实现的目标。

Answer 3

根据您尝试做的事情，您可以使用preg_grep函数使用正则表达式，也可以使用DOMDocument类遍历DOM。

使用PHP在模式中获取一个字符串

3 个答案: