从PHP中显示删除HTML

时间:2014-03-20 14:58:42

标签: php

我有这样的文字:http://pastebin.com/2Zgbs7hi

我希望能够从中删除HTML代码并只显示纯文本,但我想保留至少一个换行符,其中目前有几个换行符

我试过了:

$ticket["summary"] = 'pastebin example';

$TicketSummaryDisplay = nl2br($ticket["summary"]);
$TicketSummaryDisplay = stripslashes($TicketSummaryDisplay);
$TicketSummaryDisplay = trim(strip_tags($TicketSummaryDisplay));
$TicketSummaryDisplay = preg_replace('/\n\s+$/m', '', $TicketSummaryDisplay);
echo $TicketSummaryDisplay;

显示为纯文本,但它显示为一个大块文本,根本没有换行符

9 个答案:

答案 0 :(得分:1)

也许这会为你赢得一些时间。

<?php
libxml_use_internal_errors(true); //crazy o tags
$html = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$dom = new DOMDocument;
$dom->loadHTML($html);

$result='';
foreach ($dom->getElementsByTagName('p') as $node) {
    if (strstr($node->nodeValue, 'Legal Disclaimer:')){
        break;
    }
    $result .= $node->nodeValue;

}
echo $result;

答案 1 :(得分:0)

此示例应成功将html中的文本存储到字符串数组中。

剥离所有标记后,可以使用带有\ R特殊字符的preg_split(匹配任何换行序列)将字符串转换为数组。该数组现在将有几个空白值,并且还会有一些html非破坏空间实体,因此我们将使用array_filter()函数检查数组中的空值(它将删除所有不满足的项目)过滤条件,在我们的例子中,是一个空值)。以下是&nbsp;实体的问题,因为&nbsp;和空格字符不相同,它们具有不同的ASCII代码,因此trim()函数不会删除&nbsp;个空格。这里有两个可能的解决方案,第一个未注释的部分将仅替换&amp; nbsp并检查空格字符,而第二个注释的部分将解码所有html实体并检查空格。

<强> PHP:

$text = file_get_contents( 'http://pastebin.com/raw.php?i=2Zgbs7hi' );
$text = strip_tags( $text );

$array = array_filter( 
    preg_split( '/\R/', $text ), 
    function( &$item ) {

        $item = str_replace( '&nbsp;', ' ', $item ); 
        return trim( $item );

        // $item = html_entity_decode( $item );     
        // return trim( str_replace( "\xC2\xA0", ' ', $item ) );

    }
);

foreach( $array as $value ) {
    echo $value . '<br />';
}

数组输出:

Array
(
    [8] => Hi,
    [11] => Ashley has explained that I need to ask for another line and broadband for the wifi to work, please can you arrange this.
    [13] => Regards
    [23] => Legal Disclaimer:
    [24] => This email and its attachments are confidential. If you received it by mistake, please don’t share it. Let us know and then delete it. Its content does not necessarily represent the views of The Dragon Enterprise
    [25] =>  Centre and we cannot guarantee the information it contains is complete. All emails are monitored and may be seen by another member of The Dragon Enterprise Centre's staff for internal use
)

现在你应该有一个清晰的数组,只包含有值的项目。顺便说一下,HTML中的换行符是通过<br />表示的,而不是通过\n表示的,您的示例是因为Web浏览器中的响应仍然具有它们,但它们仅在页面源代码中可见。我希望我没有错过这个问题。

答案 2 :(得分:0)

尝试使用线制动器获取文本输出

<?php
$ticket["summary"]  = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');

$TicketSummaryDisplay = nl2br($ticket["summary"]);

echo strip_tags($TicketSummaryDisplay,'<br>');


?>

答案 3 :(得分:0)

您正在询问如何在“没有换行符的一大块文本”中添加换行符。

简短回答

  • 剥离HTML标记后,应用具有所需文本块长度的wordwrap
  • $text = wordwrap($text, 90, "<br />\n");
  • 我真的很想知道为什么之前没有人建议这个功能。
  • 周围还有chunk_split,它不考虑单词,只是在一定数量的字符之后拆分。打破单词 - 但这不是你想要的,我想。

<强> PHP

<?php
$text = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');

/**
 * Returns string without html tags, also
 * removes takes control chars, spaces and "&nbsp;" into account.
 */
function dropHtmlTags($string) {

    // remove html tags
    //$string = preg_replace ('/<[^>]*>/', ' ', $string);
    $string = strip_tags($string);

    // control characters and "&nbsp"
    $string = str_replace("\r", '', $string);    // remove
    $string = str_replace("\n", ' ', $string);   // replace with space
    $string = str_replace("\t", ' ', $string);   // replace with space
    $string = str_replace("&nbsp;", ' ', $string);

    // remove multiple spaces
    $string = preg_replace('/ {2,}/', ' ', $string);
    $string = trim($string);

    return $string;

}

$text = dropHtmlTags($text);

// The Answer: insert line breaks after 95 chars,
// to get rid of the "one big block of text with no line breaks at all"
$text = wordwrap($text, 95, "<br />\n");

// if you want to insert line-breaks before the legal disclaimer, 
// uncomment the next line
//$text = str_replace("Regards Legal Disclaimer", "<br /><br />Regards Legal Disclaimer", $text);

echo $text;
?>

<强>结果

  • 第一部分显示了您的文本块
  • 第二部分显示应用了wordwrap的文本(上面的代码)

enter image description here

答案 4 :(得分:0)

您好,可以按照以下方式完成:

$abc= file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');

$abc = strip_tags("\n", $abc);

 echo $abc;

请告诉我它是否有效

答案 5 :(得分:0)

你可以使用

<?php
$a= file_get_contents('a.txt');
echo nl2br(htmlspecialchars($a));
?>

答案 6 :(得分:-1)

<?php

$handle = @fopen("pastebin.html", "r");
if ($handle) {
    while (!feof($handle)) {
        $buffer = fgetss($handle, 4096);
        echo $buffer;
    }
    fclose($handle);
}
?>

输出

Hi,

&nbsp;
Ashley has explained that I need to ask for another line and broadband for the wifi to work, please can you arrange this.
&nbsp;
Regards
&nbsp;


&nbsp;

&nbsp;

&nbsp;
&nbsp;
Legal Disclaimer:
This email and its attachments are confidential. If you received it by mistake, please don&#8217;t share it. Let us know and then delete it. Its content does not necessarily represent the views of&nbsp;The Dragon Enterprise
 Centre&nbsp;and we cannot guarantee the information it contains is complete. All emails are monitored and may be seen by another member of The Dragon Enterprise Centre's staff for internal use
&nbsp;

&nbsp;
&nbsp;
&nbsp;

您可以编写其他代码以转换为空格等。

答案 7 :(得分:-1)

我不确定我是否理解了所有内容,但这似乎是您预期的结果:

$txt  = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');

var_dump(preg_replace("/(\&nbsp\;(\s{1,})?)+/", "\n", trim(strip_tags(preg_replace("/(\s){1,}/", " ", $txt)))));


//more readable

$txt = preg_replace("/(\s){1,}/", " ", $txt);
$txt = trim(strip_tags($txt));
$txt = preg_replace("/(\&nbsp\;(\s{1,})?)+/", "\n", $txt);

答案 8 :(得分:-1)

strip_tags()函数从字符串中剥离HTML和PHP标记,如果这是您要完成的任务。

文档中的示例:

<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";

// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>

以上示例将输出:

Test paragraph. Other text
<p>Test paragraph.</p> <a href="#fragment">Other text</a>