我正在使用维基百科JSON API,我带来检索没有链接的页面内容 例如,
https://en.wikipedia.org/w/api.php?action=query&format=json&titles=May_21&prop=revisions&rvprop=content&rvsection=1
例如:
[[293]] – Roman Emperors [[Diocletian]] and [[Maximian]] appoint [[Galerius]] as [[Caesar (title)|''Caesar'']] to Diocletian, beginning the period of four rulers known as the [[Tetrarchy]].
将&ndash
替换为-
[[Caesar (title)|''Caesar'']]
应为Caesar
我正在使用Objective-C
如何检索相同的网页内容,但没有链接字符?
谢谢!
答案 0 :(得分:2)
答案 1 :(得分:1)
应该是: - )
NSString * stringToParse = @"{\"query\":{\"normalized\":[{\"from\":\"May_21\",\"to\":\"May 21\"}],\"pages\":{\"19684\":{\"pageid\":19684,\"ns\":0,\"title\":\"May 21\",\"revisions\":[{\"*\":\"==Events==\\n* [[293]] – Roman Emperors [[Diocletian]] and [[Maximian]] appoint [[Galerius]] as [[Caesar (title)|''Caesar'']] to Diocletian, beginning the period of four rulers known as the [[Tetrarchy]].\\n* [[878]] – [[Syracuse, Italy]], is [[Muslim conquest of Sicily|captured]] by the ...";
//Replace &ndash with -
stringToParse = [stringToParse stringByReplacingOccurrencesOfString:@"&ndash" withString:@"-"];
//[[Caesar (title)|''Caesar'']] Should be Caesar
//and [[Maximian]] should be Maximian
//same for [[1972]] -> 1972
NSString *regexToReplaceWikiLinks = @"\\[\\[([A-Za-z0-9_ ()]+?\\|)?(\\'\\')?(.+?)(\\'\\')?\\]\\]";
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:regexToReplaceWikiLinks
options:NSRegularExpressionCaseInsensitive
error:&error];
// attention, the found expression is replacex with the third parenthesis
NSString *modifiedString = [regex stringByReplacingMatchesInString:stringToParse
options:0
range:NSMakeRange(0, [stringToParse length])
withTemplate:@"$3"];
NSLog(@"%@", modifiedString);
结果:
{"query":{"normalized":[{"from":"May_21","to":"May 21"}],"pages":{"19684":{"pageid":19684,"ns":0,"title":"May 21","revisions":[{"*":"==Events==\n* 293 -; Roman Emperors Diocletian and Maximian appoint Galerius as Caesar to Diocletian, beginning the period of four rulers known as the Tetrarchy.\n* 878 -; Syracuse, Italy, is captured by the ...
答案 2 :(得分:0)
Regular expressions是解决这个问题的方法;以下是使用JavaScript的示例(但您可以将相同的解决方案应用于具有正则表达式的任何语言);
<dl>
<script type="text/javascript">
var source = "[[293]] – Roman Emperors [[Diocletian]] and [[Maximian]] appoint [[Galerius]] as [[Caesar (title)|''Caesar'']] to Diocletian, beginning the period of four rulers known as the [[Tetrarchy]].";
document.writeln('<dt> Original </dt>');
document.writeln('<dd>' + source + '</dd>');
// Replace links with any found titles
var matchTitles = /\[\[([^\]]+?)\|\'\'(.+?)\'\']\]/ig; /* <- Answer */
source = source.replace(matchTitles, '$2');
document.writeln('<dt> First Pass </dt>');
document.writeln('<dd style="color: green;">' + source + '</dd>');
// Replace links with contents
var matchLinks = /\[\[(.+?)\]\]/ig;
source = source.replace(matchLinks, '$1');
document.writeln('<dt> Second Pass </dt>');
document.writeln('<dd>' + source + '</dd>');
</script>
</dl>
你也可以在这里看到这个:http://jsfiddle.net/NujmB/
答案 3 :(得分:0)
我不知道目标C,但这里是我用于同一目的的javascript代码
(它可以作为psedo代码给你并帮助其他用户从javascript)
var url = 'http://en.wikipedia.org/w/api.php?callback=?&action=parse&page=facebook&prop=text&format=json§ion=0';
// Section = 0 for taking first section of wiki page i.e. introduction only
$.getJSON(url,function(response){
// Taking only the first paragraph from introduction
var intro = $(response.parse.text['*']).filter('p:eq(0)').html();
var wikiBox = $('#wikipediaBox .wikipedia div.overview');
wikiBox.empty().html(intro);
// Converting relative links into absolute ones and links into outer links
wikiBox.find("a:not(.references a)").attr("href", function(){ return "http://www.wikipedia.org" + $(this).attr("href");});
wikiBox.find("a").attr("target", "_blank");
// Removing edits markers
wikiBox.find('sup.reference').remove();
});