Question

我在变量中得到了一个html字符串，看起来像这样：

<h1>Title 1</h1>
 Introduction
 <h2>Chapter 1</h2>
  <p>Always just one line</p>
  <p class="description">Some more text.</p>
  <p class="description">Maybe with multiple lines.</p>
 <h2>Chapter 2</h2>
  <p>Always just one line</p>
  <p class="description">Some more text.</p>
  <p class="description">Maybe with multiple lines.</p>

<h1>Title 2</h1>
Introduction
 <h2>Chapter 1</h2>
  <p>Always just one line</p>
  <p class="description">Some more text.</p>
  <p class="description">Maybe with multiple lines.</p>
 <h2>Chapter 2</h2>
  <p>Always just one line</p>
  <p class="description">Some more text.</p>
  <p class="description">Maybe with multiple lines.</p>

为了进一步处理，我需要变量（数组）中的这些“块”。首先，以<h1>开头并转到下一个<h1>的主要章节应该分开。

我尝试将explode()与分隔符<h1一起使用，但这会删除部分代码本身。

作为第二步，我还需要将每个“块”的章节分开。在最后一步，我需要获得章节内容的描述。

我认为关键是第一步：将完整的东西拆分为主要章节。之后我可以使用相同的技术处理foreach循环中的“子块”或其他东西（我猜）。

Answer 1

好。没问题。使用explode()功能。它会删除<h1，您可以像这样轻松添加<h1：

<?php
$html = '<h1>Title 1</h1>
     Introduction
     <h2>Chapter 1</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>
     <h2>Chapter 2</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>

    <h1>Title 2</h1>
    Introduction
     <h2>Chapter 1</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>
     <h2>Chapter 2</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>
    ';

$html = explode('<h1', $html);
for ($i = 0 ; $i < count($html) ; $i++) $html[$i] = '<h1' . $html[$i];
unset($html[0]); //if <h1 is on the beginning of string
print_r(var_dump($html));

顺便说一句，你可以删除第一个索引，如果它是空的。（因为<h1位于字符串的开头部分），您甚至可以在for中添加此内容：

if ($html[$i] == '') unset($html[$i]);

回答您的评论：

如果你想要分割<h2，你可以用h2分隔符重新做同样的事情：

<?php
$html = '<h1>Title 1</h1>
     Introduction
     <h2>Chapter 1</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>
     <h2>Chapter 2</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>

    <h1>Title 2</h1>
    Introduction
     <h2>Chapter 1</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>
     <h2>Chapter 2</h2>
      <p>Always just one line</p>
      <p class="description">Some more text.</p>
      <p class="description">Maybe with multiple lines.</p>
    ';

$html = explode('<h1', $html);
for ($i = 0 ; $i < count($html) ; $i++) $html[$i] = '<h1' . $html[$i];

// h2:

for ($i = 0 ; $i < count($html) ; $i++){
    $html[$i] = explode('<h2', $html[$i]);
    for ($j = 0 ; $j < count($html[$i]) ; $j++) if(strpos($html[$i][$j],'>') == 0) $html[$i][$j] = '<h2' . $html[$i][$j];
}
unset($html[0]);
print_r(var_dump($html));

Answer 2

如评论中所述，您可以explode('\n', $string)然后迭代所有行，切换到下一章，如果strpos($line, '<h1>') !== false。

但是，通常不能使用简单的字符串工具从字符串中提取html元素。请尝试使用DOMDocument::loadHTML()。

按章节拆分html内容

2 个答案: