我正在努力将html链接转换为保持相同html结构的文本。
我需要隐藏这个html页面部分
<div>
<p>text text bla blah</p>
<p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
<p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
</div>
进入这个
<div>
<p>text text bla blah</p>
<p>Cool website https://google.com</p>
<p>Cool website https://google.com</p>
</div>
我发现了一个很好的脚本PHP regex: How to convert HTML string with links into plain text that shows URL after text in brackets 它收集html链接并将它们转换为纯文本,这是工作的一部分。
这是我到目前为止所做的:
$htmlString = '
<div>
<p>text text bla blah</p>
<p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
<p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
</div>
';
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($htmlString);
$xpath = new DOMXPath($dom);
$links = [];
$linksAsString = '';
foreach ($xpath->query('//a') as $linkElement)
{
$link = [
'href' => $linkElement->getAttribute('href'),
'text' => $linkElement->textContent
];
$links[] = $link;
$linksAsString .= $link['text'] . " {$link['href']}<br/>";
}
libxml_clear_errors();
echo $linksAsString;
当前代码仅输出转换后的链接:
Cool website https://google.com
Cool website https://google.com
我将不胜感激。
答案 0 :(得分:0)
您可以将str_replace
与完整元素一起使用。
<?php
$htmlString = '
<div>
<p>text text bla blah</p>
<p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
<p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
</div>
';
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($htmlString);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a') as $linkElement)
{
$htmlString = str_replace($dom->saveHTML($linkElement), $linkElement->textContent . ' ' . $linkElement->getAttribute('href'), $htmlString);
}
libxml_clear_errors();
echo $htmlString;
输出:
<div>
<p>text text bla blah</p>
<p>Cool website https://google.com</p>
<p>Cool website https://google.com</p>
</div>
答案 1 :(得分:0)
这有点痛苦,但是使用DOM可以实现你的目标,你只需要稍微混乱就能在合适的空间中找到合适的文本......
<?php
$htmlString = '
<div>
<p>text text bla blah</p>
<p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
<p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
</div>
';
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($htmlString);
$xpath = new DOMXPath($dom);
$links = [];
$linksAsString = '';
foreach ($xpath->query('//a') as $linkElement)
{
$linksAsString = $linkElement->textContent . " ".$linkElement->getAttribute('href');
$parentNode = $linkElement->parentNode;
$parentNode->removeChild($linkElement);
$newText = $dom->createTextNode($linksAsString);
$parentNode->appendChild($newText);
}
libxml_clear_errors();
echo $dom->saveXML();
...给出
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div>
<p>text text bla blah</p>
<p>Cool website https://google.com</p>
<p>Cool website https://google.com</p>
</div></body></html>