从文本中删除不需要的HTML代码(文章)

时间:2015-01-31 14:25:55

标签: html joomla

我有一个joomla网站,我刚刚从HTML迁移过来。有1000篇文章,每篇都包含不需要的HTML代码,如下所示。 如何在不打开每个文章的情况下摆脱这些文章中的HTML?

<div id="mainDIV">
<div id="topDIV">
<div id="topnav">
<div>
<div id="topnavdiv0"> </div>
<div id="topnavdiv"><a href="../store/">SHOP NOW</a> <img title="" src="images/shop-basket.gif" />  |  1-800-336-1630</div>
</div>
</div>
</div>
<div style="clear: both;"> </div>

<table id="mainBody" >
<tbody>
<tr>
<td id="left"> </td>
<td id="mid"><!-- top -->
<div id="top1">
<div id="bbb-logo"><a href="http://app.southeasttexas.bbb.org/report/10014674/"><img src="images/logo-bbb.gif" alt="metal-market-report-02-27-12" /></a></div>
</div>
<!--div id="top2"></div-->
<div id="flashnav"> </div>
<div id="topsep"> </div>
<!-- top --> <!-- content -->
<table id="contentBody">
<tbody>
<tr>
<td id="contentSep"> </td>
<td id="contentLeft">
<div id="titleBGlong">Metals Market Reports</div>
<br />

我真的希望我不必回来再问同样的问题,但即使删除所有问题,我仍然会发现错误。 请看下面的错误:

There seems to be an error in your SQL query. The MySQL server error output below, if there is any, may also help you in diagnosing the problem

ERROR: Unknown Punctuation String @ 1
STR: <?
SQL: <?php
$query = mysqli_query($con, 'SELECT * FROM th18k_content WHERE id BETWEEN 0 AND 50');

SQL query: Documentation

MySQL said: Documentation

#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '<?php
$query = mysqli_query($con, 'SELECT * FROM th18k_content WHERE id BETWEEN' at line 1 

1 个答案:

答案 0 :(得分:0)

您想删除文章中的HTML标记吗?首先找到存储在数据库中的那些文章的表,然后只需获取它们并使用

进行查看
<?php
$query = mysqli_query($con, 'SELECT * FROM th18k_content WHERE id BETWEEN 0 AND 50');
                                              //get articles from database
while ($row= mysqli_fetch_array($query, MYSQLI_ASSOC)) { //for each article
  $lines = explode('\n',$row['article']);                      //split it into lines
  for($i=0;$i<sizeof($lines);$i++)                     //so we can remove
  {                                            //the ones that we don't need
    if(strpos($line,'titleBGlong') === false) //if 'titleBGlong' isn't found...
    {
      unset($lines[$i]);                       //remove the line
    }
    else 
    {
      $newarticle = implode('\n',$lines);     //else put it back together
      break;                                  //and exit the loop
    }                          //now the $newarticle has the beginning removed
  }
  $strippedarticle = strip_tags($newarticle );//remove HTML tags
  mysqli_query($con, 'UPDATE th18k_content SET article = "'.$strippedarticle.'" WHERE id = '.$row['id']);
}                                             //replace the article in the db
?>

我不知道您的数据库列和表的确切内容是什么,因此您需要更改它。我也是在0到50之间做的,因为你可能会用查询来填充数据库,因为每篇文章需要2个查询(只需运行代码,更改为下一个50并再次运行,依此类推)

@EDIT 该脚本可以通过将其保存在服务器上的.php文件中运行并像普通网站页面一样运行(在本例中我没有连接到数据库)

在找到“titleBGlong”之前将删除所有行,然后您可以使用strip_tags删除标记