PHP如何检测段落是否包含多行?

时间:2018-11-24 01:47:00

标签: php xpath

例如,我有以下代码:

<p>
"Lorem Ipsum is simply dummy text of the printing and typesetting industry.

Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.


It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of 

Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."

</p>

正确的地方是:

<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry.</p>
<p>Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.</p>

是否可以使用xpath来做到这一点? 我该如何检查每个p标签,以及是否用正确的代码格式化?

3 个答案:

答案 0 :(得分:1)

此代码应执行您想要的操作。它使用DOMXPath查找所有<p>元素,然后使用preg_split将内容分成几行,用第一行替换原始<p>元素的内容,然后然后根据需要为每个后续行添加新的<p>元素。

$doc = new DOMDocument();
$doc->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXpath($doc);
$paras = $xpath->query('//p');
foreach ($paras as $p) {
    $lines = preg_split('/(\s*[\r\n]\s*)+/', $p->textContent, -1, PREG_SPLIT_NO_EMPTY);
    $p->textContent = array_shift($lines);
    foreach ($lines as $line) {
        // create a new <p> element
        $new = $doc->createElement('p');
        $new->textContent = $line;
        $p->parentNode->insertBefore($new, $p->nextSibling);
    }
}
echo $doc->saveHTML();

输出示例数据:

<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry.</p>
<p>Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>
<p>It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of</p>
<p>Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.</p>

请注意,此代码仅在<p>元素不包含任何子HTML元素(例如<a>等)时有效。如果真是这样,问题就会变得更加复杂...

Demo on 3v4l.org

答案 1 :(得分:1)

我不认为您可以使用Xpath做到这一点,但这是php示例

<?php
$paragraph = <<<EOF
<p>
"Lorem Ipsum is simply dummy text.

Letraset sheets containing ."

</p>
EOF;

foreach(explode("\n", $paragraph) as $line)
{
  if(!empty($line) && strrpos($line, 'p>') === false)
    echo "<p>" . trim($line, '"') . "</p>\n";
}

答案 2 :(得分:0)

一种简单的方法是使用preg_split()。之后,将它们用<p>标签包裹起来

这里是例子

PHP

<?php
// example code

$status = "Lorem Ipsum is simply dummy text of the printing and typesetting industry.

Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.


It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of 

Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";


$tagsp = preg_split('/\n+/', $status);

foreach($tagsp as $p)
{
    if(strlen($p) > 0)
    {
        echo "<p>$p</p>";
    }
}

DEMO