Question

我正在搜索一个简单的解析器，它将带有wiki标记代码的String转换为可读的纯文本，例如。

A lot of these sources can also be used to add to other parts of the article, like the plot section. <font color="silver">[[User:Silver seren|Silver]]</font><font color="blue">[[User talk:Silver seren|seren]]</font><sup>[[Special:Contributions/Silver seren|C]]</sup> 05:34, 22 March 2012 (UTC)

到

A lot of these sources can also be used to add to other parts of the article, like the plot section. SilverserenC 05:34, 22 March 2012 (UTC)

我尝试使用DKPro JWPL（上面的例子也来自），但是这个框架纯文本输出并没有以正确的方式解析wiki谈话页面（讨论）。它只删除以多个“：”字符开头的行，这些字符对谈话页面至关重要。

Answer 1

好的，我发现JWPL的旧维基百科解析器正在运行：“de.tudarmstadt.ukp.wikipedia.parser” link

您可以像以下一样使用它：

MediaWikiParserFactory pf = new MediaWikiParserFactory(Language.english);
MediaWikiParser parser = pf.createParser();
ParsedPage pp = parser.parse("some wiki code with markups");
System.out.println(pp.getText());

简单的维基百科文本到纯文本解析器？

1 个答案: