Question

我正在尝试创建一个魔兽世界宝石数据库。如果我转到此页面：

http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=purple&searchType=items

转到Firefox中的View Source，我看到了大量的XML数据，这正是我想要的。我写了这个快速脚本来尝试解析其中的一些：

<?php

$gemUrls = array(
                 'Blue' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=blue&searchType=items',
                 'Red' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=red&searchType=items',
                 'Yellow' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=yellow&searchType=items',
                 'Meta' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=meta&searchType=items',
                 'Green' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=green&searchType=items',
                 'Orange' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=orange&searchType=items',
                 'Purple' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=purple&searchType=items',
                 'Prismatic' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=purple&searchType=items'
                 );


// Get blue gems

$blueGems = file_get_contents($gemUrls['Blue']);

$xml = new SimpleXMLElement($blueGems);

echo $xml->items[0]->item;

?>

但是我遇到了很多这样的错误：

警告：   的SimpleXMLElement :: __结构（）   [的SimpleXMLElement .--构造]：   实体：第20行：解析器错误：   xmlParseEntityRef：没有名字   C：\ xampp \ htdocs \ WoW \ index.php在线   19

警告：   的SimpleXMLElement :: __结构（）   [的SimpleXMLElement .--构造]：   if（Browser.iphone＆amp;＆amp;   数（getcookie2（ “mobIntPageVisits”））   ＆LT; 3＆amp;＆amp; getcookie2（in   C：\ xampp \ htdocs \ WoW \ index.php在线   19

我不确定是什么问题。我认为file_get_contents()带回的数据不是XML，也许是一些Javascript文件，根据错误中的iPhone部分来判断。

有没有办法从该页面取回XML？没有任何HTML或任何东西？

谢谢：）

Answer 1

返回的是xhtml，它是xml-ish，但对于XML解析器来说还不够好。要使用SimpleXMLElement，您需要格式良好的XML。来自documentation of the constructor：

方法签名：

__construct ( string $data [, int $options [, bool $data_is_url 
             [, string $ns [, bool $is_prefix ]]]] )

$data被描述为：

格式良好的XML字符串或路径或者如果是XML文档的URL data_is_url为TRUE。

因此，随机网页不会满足此解析器。你问：

“有没有办法让我们回来该页面的XML？没有任何HTML 还是什么？“

您可以与网站管理员联系，了解他们是否拥有数据的XML视图。如果做不到这一点，您可以使用纯HTML解析器来尝试提取数据。我喜欢PHP Simple HTML DOM Parser。查看How to implement a web scraper in PHP？

从外部页面获取XML数据并使用PHP解析它

1 个答案: