I’d like to retrieve all the articles that link to an article but only if the link appears in a certain section in the article that has the link. Usually, this “certain section” is the first paragraph of text. If we use first paragraph of text as an example, for the article https://en.wikipedia.org/wiki/Directed_graph , I should retrieve the article:
https://en.wikipedia.org/wiki/Directed_acyclic_graph'
Which has this text as its first paragraph:
In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG /ˈdæɡ/ (About this soundlisten)), is a finite directed graph with no directed cycles. That is, it consists of finitely many vertices and edges (also called arcs), with each edge directed from one vertex to another, such that there is no way to start at any vertex v and follow a consistently-directed sequence of edges that eventually loops back to v again. Equivalently, a DAG is a directed graph that has a topological ordering, a sequence of the vertices such that every edge is directed from earlier to later in the sequence.
But not https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)
Which has a link to https://en.wikipedia.org/wiki/Directed_graph in later parts of the article (eg. See https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)#Directed_graph) but not in the first paragraph.
How can I do this? I do not mind using any method, and would prefer PHP as the programming language to use. More concerned about what platforms/APIs/tools wikipedia provides that can assist me in this endeavour eg. Which Wikipedia API entry point or methods would be useful in helping me retrieve links that only exist in some part of an article eg. The first paragraph.
答案 0 :(得分:0)
您说的是“链接到文章的文章”,但您的问题是指从文章链接的文章。您能否弄清楚是要到还是来自 Directed graph的链接?如果您对该文章的至链接感兴趣,则需要使用https://en.wikipedia.org/wiki/Special:WhatLinksHere/Directed_graph
的API版本要获取第一段中的链接,可以使用https://en.wikipedia.org/w/api.php?action=query&prop=links&titles=Directed_graph&format=json§ion=1
如果需要其他段落,请更改 section = 1 。
如果要查找特定文章的所有链接,将会更加复杂(您可能需要对每个文章进行单独的调用)。
答案 1 :(得分:0)
MediaWiki API提供了一个选项来查找链接到另一个页面的所有页面:
https://www.mediawiki.org/w/api.php?action=help&modules=query%2Blinkshere
不幸的是,我认为没有参数可以指定节号。但是,即使存在此参数,页面部分也会被编号,并且页面的第一段没有“零”部分。