Question

URL：

http://en.wikipedia.org/w/api.php?action=parse&prop=text&page=Lost_(TV_series)&format=xml

输出如下内容：

<api><parse><text xml:space="preserve">text...</text></parse></api>

如何获得 <text xml:space="preserve"> 和 </text> 之间的内容？

我使用 curl 来获取此网址中的所有内容。所以这给了我：

$html = curl_exec($curl_handle);

下一步是什么？

Answer 1

使用PHP DOM进行解析。这样做：

//you already have input text in $html
$html = '<api><parse><text xml:space="preserve">text...</text></parse></api>';

//parsing begins here:
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('text');

//display what you need:
echo $nodes->item(0)->nodeValue;

输出：

文字...

从MediaWiki API调用中提取内容（XML，cURL）

1 个答案: