我从一本书中复制了这段代码
<?php
# Initialization
include("LIB_http.php");
include("LIB_parse.php");
$product_array=array();
$product_count=0;
# Download the target (practice store) web page
$target = "http://www.WebbotsSpidersScreenScrapers.com/example_store";
$web_page = http_get($target, "");
# Parse all the tables on the web page into an array
$table_array = parse_array($web_page['FILE'], "<table", "</tables>");
#Look for the the table that contains the product information
for($xx=0; $xx<count($table_array); $xx++)
{
$table_landmark = "Products For Sale";
if(stristr($table_array[$xx], $table_landmark)) // Process this table
{
echo "FOUND: Product table\n";
# Parse table into an array of table rows
$product_row_array = parse_array($table_array[$xx], "<tr", "</tr>");
for($table_row=0; $table_row<count($product_row_array); $table_row++)
{
# Detect the beginning of the desired data (heading row)
$heading_landmark = "Condition";
if((stristr($product_row_array[$table_row], $heading_landmark)))
{
echo "FOUND: Talbe heading row\n";
# Get the position of the desired headings
$table_cell_array = parse_array($product_row_array[$table_row], "<td", "</td>");
for($heading_cell=0; $heading_cell<count($table_cell_array); $heading_cell++)
{
if(stristr(strip_tags(trim($table_cell_array[$heading_cell])), "ID#"))
$id_column=$heading_cell;
if(stristr(strip_tags(trim($table_cell_array[$heading_cell])), "Product name"))
$name_column=$heading_cell;
if(stristr(strip_tags(trim($table_cell_array[$heading_cell])), "Price"))
$price_column=$heading_cell;
}
echo "FOUND: id_column=$id_column\n";
echo "FOUND: price_column=$price_column\n";
echo "FOUND: name_column=$name_column\n";
# Save the heading row for later use
$heading_row = $table_row;
}
#Detect the end of the desired data table
$ending_landmark = "Calculate";
if((stristr($product_row_array[$table_row], $ending_landmark)))
{
echo "PARSING COMPLETE!\n";
break;
}
# Parse product and price data
if(isset($heading_row) && $heading_row<$table_row)
{
$table_cell_array = parse_array($product_row_array[$table_row], "<td", "</td>");
$product_array[$product_count]['ID'] = strip_tags(trim($table_cell_array[$id_colum]));
$product_array[$product_count]['NAME'] = strip_tags(trim($table_cell_array[$name_colum]));
$product_array[$product_count]['PRICE'] = strip_tags(trim($table_cell_array[$price_colum]));
$product_count++;
echo"PROCESSED: Item #$product_count\n";
}
#Display the collected data
for($xx=0; $xx<count($product_array); $xx++)
{
echo "$xx. ";
echo "ID: ".$product_array[$xx]['ID'].", ";
echo "NAME: ".$product_array[$xx]['NAME'].", ";
echo "PRICE: ".$product_array[$xx]['PRICE'].", ";
}
}
}
}
脚本再次给我没有错误,但它也没有输出任何内容。我不确定我是否需要添加?&gt;最后与否。这只是我运行的第二个PHP脚本,所以我不确定。
答案 0 :(得分:0)
没有回答你的主要问题,我认为Marc B对你说得很好,但既然你提到了,我会补充一下这个结果?&gt;不需要。事实上,当你有很多文件并且末尾有空行时,它会导致“已发送标题”问题。
答案 1 :(得分:0)
此代码来自一本名为
的书Webbots,Spiders和Screen Scrapers:Michael Schrenk使用PHP / CURL开发Internet代理的指南
第一次代码对我不起作用
检查代码后,我发现目标地址已被更改
替换
$ target =“http://www.WebbotsSpidersScreenScrapers.com/example_store”;
与
$ target =“http://www.webbotsspidersscreenscrapers.com/buyair”;