如何通过simple_php_dom将删除的数据插入到数据库中

时间:2014-04-04 06:36:55

标签: php web screen-scraping

我正在实现一个代码,用于将其中一个站点的已删除值插入到数据库中,但它会在数据库中插入两次。经过太多的分析后,我仍然无法弄清楚为什么它会被插入数据库两次:

我的代码如下:

include('simple_html_dom.php');

$aflink3 = "http://aliveforfootball.com/blog/david-moyes-confident-manchester-united- future/";
$linkurl = $aflink3;

// Loading the url
$html = file_get_html($linkurl);

// an array state to find the html elements for scraping the data
$States = array
(
array("state","div.entry-content",""),
array("article.post",1,1)                   
);

// Finding the title of the article
if(($html->find("meta[property='og:title']",0))!=null){ $metatitle = $html->find   ("meta[property ='og:title']",0)->content;}
$title = $metatitle;

// Foreach to find the meta property of type images.
$metaimages = array();
if(($html->find("meta[property='og:image']"))!=null){
foreach($html->find("meta[property='og:image']") as $metaimage){
       $item['image'] = $metaimage->content;
       $metaimages = $item;
  }         
  }else {}

// Function to find the paragraphs of a particular article
function findParagraphs($article){
global $subtitle1;
global $articlecontent;
global $content;
global $spancontent;

$spancontent = array();

$content = array();
    $articlecontent = array();  
foreach($article->find('p') as $p){
    $articlecontent[] = $p->plaintext;
}

foreach($article->find('p span') as $spandiv){
    $spancontent[] = $spandiv->plaintext;       
}

$articlelength = count($articlecontent);
$spanlength = count($spancontent);

for($i=0;$i<$articlelength;$i++){
    for($j=0;$j<$spanlength;$j++){
    if(strpos($articlecontent[$i],(substr($spancontent[$j],0,5))) === false){
    }else{ $articlecontent[$i] = ""; }
    }
}
$content = $articlecontent;
}


$flag = 0;
$article = null;
$state = 0;
// Function to match the html elements to construct the data for the article section
$rows = count($States);
for($row = 0; $row < 2; $row++) {
 for($col = 0; $col < 3; $col++ ) {
echo "[".$row."][".$col."]<BR>";
if($States[$row][$col] == 1){
    $statefound = $States[$row][0]." ".$States[0][$col];
    $article = $html->find($statefound,0);
    if(isset($article) && ($state == 0)){
        $state = 1;         
        findParagraphs($article);       
        break 2;        
    }   
}   
 }
} 

// Creating the JSON Object of the scraped data
$stuff = array(
     'title' => $title , 
 'image' => $metaimages, 
 'content' => $content );


//Function to insert the Scraped-data into the database
if($stuff != null){ 
global $linkurl;

$jsencode = json_encode($stuff);

$obj = json_decode($jsencode, TRUE);
$dbcontent = "";
for($i=0; $i<count($obj['content']); $i++) {
    $dbcontent .= "<p>".$obj['content'][$i]."</p>";
}

$dbtitle = "";
for($i=0; $i<count($obj['title']); $i++) {
    $dbtitle .= "<p>".$obj['title']."</p>";
}

$dbimage = "";  
for($i=0; $i<count($obj['image']); $i++) {
    $dbimage = "<p>".$obj['image']['image']."</p>";
}

//Intializing the MySql Connections
mysql_connect("localhost", "root", "password") or die(mysql_error());
mysql_select_db("Parsing") or die(mysql_error());
mysql_query("INSERT INTO Sportparse 
(linkurl,linktitle,linkimage,linkcontent)  VALUES('$linkurl','$dbtitle','$dbimage','$dbcontent') ") 
or die(mysql_error());  
echo "Data Inserted Successfully";

//Cleaning up the memory to prevent the memory Leak
$html->clear(); 
unset($html);
} 
?>

在插入数据库的最后一个代码中,数据被插入两次而不是仅插入一次。 我已经尝试了所有的东西,但我无法修复它,我认为它值得研究这个问题,因为我的许多同事都无法弄明白。

0 个答案:

没有答案