刮书价格

时间:2011-02-24 19:15:25

标签: php curl web-scraping

我正在尝试编写一个scrape应用程序,而且我正在遇到问题。我的PHP卷曲代码没有提高书籍的价格。它将我带回域的Web根目录。

我正在尝试按ISBN搜索网站。

我一直在靠墙砸墙好几天。任何帮助将非常感谢!

代码:

<form method="post" for="new-search" name="SearchTerm" class='form-validate' id="SearchTerm" action="index.php">
    <textarea rows="3" name="SearchTerm" id="SearchTerm" cols="40" class="validate-required error"></textarea><div class="error" id="SearchTerm-error">
    <br>                        
    <button class="search primary" type="submit">continue</button>

</form>


<?php

/*
echo("<pre>");print_r($_GET);echo("</pre>");
echo("<pre>");print_r($_POST);echo("</pre>");
*/

$isbn = $_POST['SearchTerm'];


$userAgent = 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16';

$fields = array(
    'url' => ("http://www.bookleberry.com/Search/SearchKeyword"),
    'qurl' => ("http://www.bookleberry.com/Search/SearchKeyword/" . $_POST['SearchTerm']),
    'SearchTerm' => ($_POST['SearchTerm']),
    'Page' => ('1'),
    'class' => ('textfield validate-required'),
    'for' => ('new-search'),
    'result-count' => ('1'),
    'status' => 'success',
);

$SearchTerm = ($fields['SearchTerm']);
$url = ($fields['url']);
$Page = ($fields['Page']);


echo("<pre>");
print_r($fields);
echo("</pre>");

if ($isbn != NULL){

    //open connection
    $ch = curl_init($url);
    //set the url, number of POST vars, POST data
    curl_setopt($ch, CURLOPT_HEADER, $userAgent);
    curl_setopt($ch, CURLOPT_URL, $url);
        echo "before curl_exec:<br>";
        echo "curl_errno=". curl_errno($ch) ."<br>";
        echo "curl_error=". curl_error($ch) ."<br>";
    curl_setopt($ch,CURLOPT_POST,count($fields));
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, "?SearchTerm=$SearchTerm");
    curl_setopt($ch, CURLOPT_HTTPGET, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 9999999);
     curl_setopt($ch,CURLOPT_HTTPHEADER,array (
        "Accept: application/json"
    ));




    $info = curl_getinfo($ch);

    //execute post
    $result = curl_exec($ch);
    print $result;


print "<pre>\n";
print_r(curl_getinfo($ch));  // get error info

?>

1 个答案:

答案 0 :(得分:4)

不要伤到你的头,使用它!

  • 安装fiddler
  • 使用浏览器做一个请求,在fiddler中查看确切发布的内容。这包括所有标题,Cookie和表单变量。
  • 使用您的代码发帖,再次检查fiddler
  • 比较两者之间的差异并调整脚本。
  • 重复。

安装firebug也很有帮助。使用副本Xpath,并将其放入php DOM xpath查询,使得抓取变得轻松有趣!