我尝试在php中开发一个抓取工具,跟踪网店比较网站上某些产品的最优价格。我有一个带有链接的txt文件,我抓取这些链接,并从这些链接中获取我需要的信息。
<!DOCTYPE html>
<html>
<head>
<link rel='stylesheet' type='text/css' href='crawlerStyle.css'>
</head>
<body>
<div class='div-table-row'>
<div class='div-table-col-title'><span class='span-title'>Name</span></div>
<div class='div-table-col-title'><span class='span-title'>Best Pricerunner price</span></div>
</div>
<?php
$myfile = fopen("urls.txt", "r") or die("Unable to open file!");
if ($myfile) {
while (($line = fgets($myfile)) !== false) {
@follow_links($line);
}
fclose($myfile);
}
function getPRPrice($priceTag){
return substr($priceTag, 2).",00 DKK";
}
function follow_links($line) {
libxml_use_internal_errors(true);
$doc = new DOMDocument();
@$doc->loadHTML(file_get_contents($line));
$xpath = new DOMXpath($doc);
$name = $xpath->query( '////span[@class="fn" and @itemprop="name"]')->item(0);
$price = $xpath->query( '//ul[@class="itemlist" and li[@class="shoppingcol" and p[@class="button" and a[@class="button-a google-analytic-retailer-data"]]]]/*/*/*/*/*/strong[@class="validated-shipping"]')->item(0);
$company = $xpath->query( '//ul[@class="itemlist" and li[@class="shoppingcol" and p[@class="button" and a[@class="button-a google-analytic-retailer-data"]]]]/*/*/a[@class="google-analytic-retailer-data"]//img/@src')->item(0);
echo "<div class='div-table-row'>\n";
echo "<div class='div-table-col'><span>".substr($name->textContent, 0, -18)."</span></div>\n";
echo "<div class='div-table-col'><img style='display: inline-block; vertical-align:middle' src='".$company->textContent."'><a href='".$line."' target='_blank'><span>".getPRPrice($price->textContent)."</span></a></div>\n";
echo "</div>\n";
}
?>
</body>
</html>
这是一些css样式,以便您可以看到我看到的内容:
.div-table-row{
display:table;
clear:both;
}
.div-table-col{
float: none;
border-style: solid;
width: 250px;
display: table-cell;
text-align:center;
vertical-align: middle;
height: 100%;
}
.div-table-col-title{
float: none;
border-style: solid;
width: 250px;
display: table-cell;
text-align:center;
vertical-align: middle;
font-size: 30px;
height: 100%;
background: rgb(30, 139, 45) !important;
}
.productImg{
display:none;
position: absolute;
width: 200px;
}
span{
height: 100%;
width: 100%;
padding-left:10px;
padding-right:10px;
vertical-align: middle;
text-align:center;
font-size: 16px;
font-weight: 600;
font-family: "Helvetica Neue",Helvetica,Arial,sans-serif;
}
.span-title{
height: 100%;
width: 100%;
padding-left:10px;
padding-right:10px;
vertical-align: middle;
text-align:center;
font-size: 20px;
color: white;
font-weight: 900;
font-family: "Helvetica Neue",Helvetica,Arial,sans-serif;
}
这就是我试图抓取的网页的一些产品的方式
但我为这个名字所采取的范围似乎并没有完全归还。
有没有人对这个问题有所了解?
谢谢!
修改!!我使用以下链接进行测试:
http://www.pricerunner.dk/pl/1-3140663/Mobiltelefoner/Microsoft-Lumia-650-Sammenlign-Priser
http://www.pricerunner.dk/pl/1-3098807/Mobiltelefoner/Apple-iPhone-6S-64GB-Sammenlign-Priser
http://www.pricerunner.dk/pl/1-3141579/Mobiltelefoner/Samsung-Galaxy-S7-Edge-32GB-Sammenlign-Priser
http://www.pricerunner.dk/pl/1-3154462/Mobiltelefoner/HTC-10-32GB-Sammenlign-Priser
答案 0 :(得分:1)
这对我来说很好用
// Please notice the use of only two slashes and not four like you did
$name = $xpath->query('//span[@class="fn"]')->item(0)->textContent;
问题来自您之后申请的substr
答案 1 :(得分:0)
substring
正在削减$name
变量。我不久前用它来取名字。