如何使用MediaWiki解析器从wikitext获取HTML

时间:2013-08-22 12:07:53

标签: php parsing mediawiki wikipedia

我正在尝试使用Wikipedia的MediaWiki解析器将Wikipedia标记文本解析为HTML。 我在这里阅读了手册 - https://www.mediawiki.org/wiki/Manual:Parser.php 但是,由于我是PHP新手,我无法编写测试脚本,

以下是我想要解析并转换为HTML的示例输入:

Shakespeare's sonnets
==Characters==
When analysed as characters, the subjects of the sonnets are usually referred
to as the Fair Youth, the Rival Poet, and the Dark Lady. The speaker expresses
admiration for the Fair Youth's beauty, and later has an affair with the Dark
Lady. It is not known whether the poems and their characters are fiction or
autobiographical; scholars who find the sonnets to be autobiographical, notably
[[A. L. Rowse]], have attempted to identify the characters with historical
individuals.

4 个答案:

答案 0 :(得分:2)

您甚至不必使用PHP。您可以使用Wikipedia的API(或您自己的MediaWiki安装上的API)。有关详细信息,请参阅Parsing wikitext

答案 1 :(得分:0)

您可以使用JWPL http://code.google.com/p/jwpl/,它可以使用Wiki的本地副本。 加载转储,转换数据加工,导入数据库,用它做你想做的事。

答案 2 :(得分:0)

这是解析Wikitext的最少代码(在MediaWiki 1.32上测试):

$text = "Your [[wikitext]]";
$title = $skin->getTitle(); // Get the title object from somewhere or use $wgTitle
$parser = new Parser;
$parserOptions = new ParserOptions;
$parserOutput = $parser->parse( $text, $title, $parserOptions );
$html = $parserOutput->getText();
echo $html;

再见!

答案 3 :(得分:0)

   //<myname></myname>    

    public static function onParserFirstCallInit( Parser $parser ){
        $parser->setHook('myname', 'MyClass::getOutputHtml');
    }
    public static function getOutputHtml(){
        $localParser = new Parser();
        $input = OtherClass::myOutput();
        $context = new RequestContext();
        $title = $context->getTitle();
        $parserOptions = new ParserOptions;
        $output = $localParser->parse($input, $title, $parserOptions);
        return $output->getText();
    }