Question

我正在尝试使用MediaWiki获取维基百科页面（来自特定类别）。为此，我正在关注this教程 清单3.列出类别 中的页面。我的问题是：如何在不使用Zend Framework的情况下获取维基百科页面？是否有任何基于PHP的Rest客户端无需安装？因为Zend需要先安装他们的包和一些配置...而且我不想做所有这些。

在谷歌搜索和一些调查后，我找到了一个名为cURL的工具，使用cURL和PHP也可以建立一个休息服务。我真的很擅长实现休息服务，但已经尝试在php中实现一些东西：

<?php
    header('Content-type: application/xml; charset=utf-8');

    function curl($url) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $data = curl_exec($ch);
        curl_close($ch);
        return $data;
    }
    $wiki = "http://de.wikipedia.org/w/api.php?action=query&list=allcategories&acprop=size&acprefix=haut&format=xml";
    $result = curl($wiki);
    var_dump($result);
?>

但得到了结果中的错误。任何人都可以帮忙吗？

更新

This page contains the following errors:
error on line 1 at column 1: Document is empty
Below is a rendering of the page up to the first error.

Answer 1

很抱歉花了这么长时间回复，但迟到总比没有好......

当我在命令行上运行代码时，我得到的输出是：

string(120) "Scripts should use an informative User-Agent string with contact information, or they may be IP-blocked without notice.
"

所以看起来问题是你没有告诉cURL发送自定义User-Agent标题而碰到Wikimedia bot User-Agent policy。要解决此问题，请按照该页面底部给出的建议操作，并在脚本中添加如下所示的行（以及其他curl_setopt()次调用）：

$agent = 'ProgramName/1.0 (http://example.com/program; your_email@example.com)';
curl_setopt($ch, CURLOPT_USERAGENT, $agent);

聚苯乙烯。您可能也不想设置application/xml内容类型，除非您确定内容实际上是有效的XML。特别是，var_dump()的输出不是有效的XML，即使输入是。

对于测试和开发，我建议从命令行运行PHP或使用text/plain内容类型。或者，如果您愿意，请使用text/html并使用htmlspecialchars()对输出进行编码。

聚苯乙烯。这是社区维基的答案，因为我意识到这个问题已经是asked and answered before。

将Wikipedia API与Rest客户端一起使用

1 个答案: