Php将html链接转换为保持相同html结构的文本

时间:2017-07-10 11:01:53

标签: php html dom

我正在努力将html链接转换为保持相同html结构的文本。

我需要隐藏这个html页面部分

<div>
    <p>text text bla blah</p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
</div>

进入这个

<div>
    <p>text text bla blah</p>
    <p>Cool website https://google.com</p>
    <p>Cool website https://google.com</p>
</div>

我发现了一个很好的脚本PHP regex: How to convert HTML string with links into plain text that shows URL after text in brackets  它收集html链接并将它们转换为纯文本,这是工作的一部分。

这是我到目前为止所做的:

$htmlString = '
<div>
    <p>text text bla blah</p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
</div>
';

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($htmlString);
$xpath = new DOMXPath($dom);

$links = [];
$linksAsString = '';

foreach ($xpath->query('//a') as $linkElement)
{
    $link = [
        'href' => $linkElement->getAttribute('href'),
        'text' => $linkElement->textContent
    ];
    $links[] = $link;

    $linksAsString .= $link['text'] . " {$link['href']}<br/>";
}
libxml_clear_errors();

echo $linksAsString;

当前代码仅输出转换后的链接:

Cool website https://google.com
Cool website https://google.com

我将不胜感激。

2 个答案:

答案 0 :(得分:0)

您可以将str_replace与完整元素一起使用。

<?php
$htmlString = '
<div>
    <p>text text bla blah</p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
</div>
';
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($htmlString);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a') as $linkElement)
{
    $htmlString = str_replace($dom->saveHTML($linkElement), $linkElement->textContent . ' ' . $linkElement->getAttribute('href'), $htmlString);
}
libxml_clear_errors();

echo $htmlString;

输出:

<div>
    <p>text text bla blah</p>
    <p>Cool website https://google.com</p>
    <p>Cool website https://google.com</p>
</div>

演示:https://eval.in/830127

答案 1 :(得分:0)

这有点痛苦,但是使用DOM可以实现你的目标,你只需要稍微混乱就能在合适的空间中找到合适的文本......

<?php
$htmlString = '
<div>
    <p>text text bla blah</p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
</div>
';

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($htmlString);
$xpath = new DOMXPath($dom);

$links = [];
$linksAsString = '';

foreach ($xpath->query('//a') as $linkElement)
{
    $linksAsString = $linkElement->textContent . " ".$linkElement->getAttribute('href');
    $parentNode = $linkElement->parentNode;
    $parentNode->removeChild($linkElement);
    $newText = $dom->createTextNode($linksAsString);
    $parentNode->appendChild($newText);
}
libxml_clear_errors();

echo $dom->saveXML();

...给出

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div>
    <p>text text bla blah</p>
    <p>Cool website https://google.com</p>
    <p>Cool website https://google.com</p>
</div></body></html>