我正在努力学习使用RDF,并试图从dbpedia中提取一组事实作为我的学习练习。以下代码示例有点工作,但对于配偶这样的主题,它总是把自己的人拉出来。
这是代码块的输出,显示了我得到的一些奇怪的结果(参见属性中的混合输出,他与自己结婚的事实以及约瑟芬的错误名称?
Accessing facts for Napoleon held at http://dbpedia.org/resource/Napoleon
There are 800 facts about Napoleon stored at the URI
http://dbpedia.org/resource/Napoleon
Here are a few:-
Ontology:deathdate
Napoleon died on 1821-05-05
Ontology:birthdate
Napoleon was born on 1769-08-15
Property:spouse retruns the person themslves twice !
Napoleon was married to Marie Louise, Duchess of Parma
Napoleon was married to Napoleon
Napoleon was married to Jos%C3%A9phine de Beauharnais
Napoleon was married to Napoleon
Property:title retruns text and uri's
Napoleon Held the title: "The Death of Napoleon"
Napoleon Held the title: http://dbpedia.org/resource/Emperor_of_the_French
Napoleon Held the title: http://dbpedia.org/resource/King_of_Italy
Napoleon Held the title: First Consul of France
Napoleon Held the title: Provisional Consul of France
Napoleon Held the title: http://dbpedia.org/resource/Napoleon
Napoleon Held the title: Emperor of the French
Napoleon Held the title: http://dbpedia.org/resource/Co-Princes_of_Andorra
Napoleon Held the title: from the Memoirs of Bourrienne, 1831
Napoleon Held the title: Protector of the Confederation of the Rhine
Ontology birth place returns three records
Napoleon was born in Ajaccio
Napoleon was born in Corsica
Napoleon was born in Early modern France
这是产生上面输出的python,它需要rdflib并且正在进行中。
import rdflib
from rdflib import Graph, URIRef, RDF
######################################
# A quick test of a python library reflib to get data from an rdf graph
# D Moore 15/3/2014
# needs rdflib > version 3.0
# CHANGE THE URI BELOW TO A DIFFERENT PERSON AND SEE WHAT HAPPENS
# COULD DO WITH A WEB FORM
# NOTES:
#
#URI_ref = 'http://dbpedia.org/resource/Richard_Nixon'
#URI_ref = 'http://dbpedia.org/resource/Margaret_Thatcher'
#URI_ref = 'http://dbpedia.org/resource/Isaac_Newton'
#URI_ref = 'http://dbpedia.org/resource/Richard_Nixon'
URI_ref = 'http://dbpedia.org/resource/Napoleon'
#URI_ref = 'http://dbpedia.org/resource/apple'
##########################################################
def get_name_from_uri(dbpedia_uri):
# pulls the last part of a uri out and removes underscores
# got to be an easier way but it works
output_string = ""
s = dbpedia_uri
# chop the url into bits devided by the /
tokens = s.split("/")
# because the name of our person is in the last section itterate through each token
# and replace the underscore with a space
for i in tokens :
str = ''.join([i])
output_string = str.replace('_',' ')
# returns the name of the person without underscores
return(output_string)
def is_person(uri):
##### SPARQL way to do this
uri = URIRef(uri)
person = URIRef('http://dbpedia.org/ontology/Person')
g= Graph()
g.parse(uri)
resp = g.query(
"ASK {?uri a ?person}",
initBindings={'uri': uri, 'person': person}
)
print uri, "is a person?", resp.askAnswer
return resp.askAnswer
URI_NAME = get_name_from_uri(URI_ref)
NAME_LABEL = ''
if is_person(URI_ref):
print "Accessing facts for", URI_NAME, " held at ", URI_ref
g = Graph()
g.parse(URI_ref)
print "Person Extract for", URI_NAME
print "There are ",len(g)," facts about", URI_NAME, "stored at the URI ",URI_ref
print "Here are a few:-"
# Ok so lets get some facts for our person
for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/birthName")):
print URI_NAME, "was born " + str(stmt[1])
for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/deathDate")):
print URI_NAME, "died on", str(stmt[1])
for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/birthDate")):
print URI_NAME, "was born on", str(stmt[1])
for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/eyeColor")):
print URI_NAME, "had eyes coloured", str(stmt[1])
for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/spouse")):
print URI_NAME, "was married to ", get_name_from_uri(str(stmt[1]))
for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/reigned")):
print URI_NAME, "reigned ", get_name_from_uri(str(stmt[1]))
for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/children")):
print URI_NAME, "had a child called ", get_name_from_uri(str(stmt[1]))
for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/profession")):
print URI_NAME, "(PROPERTY profession) was trained as a ", get_name_fro m_uri(str(stmt[1]))
for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/child")):
print URI_NAME, "PROPERTY child ", get_name_from_uri(str(stmt[1]))
for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/deathplace")):
print URI_NAME, "(PROPERTY death place) died at: ", str(stmt[1])
for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/title")):
print URI_NAME, "(PROPERTY title) Held the title: ", str(stmt[1])
for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/sex")):
print URI_NAME, "was a ", str(stmt[1])
for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/knownfor")):
print URI_NAME, "was known for ", str(stmt[1])
for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/birthPlace")):
print URI_NAME, "was born in ", get_name_from_uri(str(stmt[1]))
else:
print "ERROR - "
print "Resource", URI_ref, 'does not look to be a person or there is no record in dbpedia'
答案 0 :(得分:2)
* get_name_from_uri *正在使用URI。由于DBpedia数据几乎对所有内容都有rdfs:labels
,因此最好还是要求rdfs:label
并将其用作值。例如,查看此SPARQL查询运行the DBpedia SPARQL endpoint的结果:
select ?spouse ?spouseName where {
dbpedia:Napoleon dbpedia-owl:spouse ?spouse .
?spouse rdfs:label ?spouseName .
filter( langMatches(lang(?spouseName),"en") )
}
spouse spouseName
http://dbpedia.org/resource/Jos%C3%A9phine_de_Beauharnais "Joséphine de Beauharnais"@en
http://dbpedia.org/resource/Marie_Louise,_Duchess_of_Parma "Marie Louise, Duchess of Parma"@en
subject_objects的文档说明了
subject_objects(self,predicate = None)
给定谓词的(主题,对象)元组的生成器
您正确地看到,DBpedia中有四个三元组具有谓词{{1}}(顺便说一下,您有没有使用dbpprop:spouse
的原因?)并且{ {1}}作为主题或对象:
dbpedia-owl:spouse
对于其中每一个,你都要打印
Napoleon
其中X是三元组的对象。也许您应该使用objects
代替:
对象(self,subject = None,谓词=无)
具有给定主题和谓词的对象生成器
DBpedia本体属性描述的数据(URI以Napoleon spouse Marie Louise, Duchess of Parma
Marie Louise, Duchess of Parma spouse Napoleon
Napoleon spouse Jos%C3%A9phine de Beauharnais
Jos%C3%A9phine de Beauharnais spouse Napoleon
开头,通常缩写为"Napoleon was married to X"
的数据)比DBpedia原始数据属性(URI开头的数据)描述的数据“更清晰”使用http://dbpedia.org/ontology/
,通常缩写为dbpedia-owl:
)。例如,当您查看标题时,您使用的是属性http://dbpedia.org/property/
,并且URI和文字都是值。它看起来不像dbpprop:
,所以在这种情况下你只需处理它。尽管如此,过滤掉其中一个很容易:
dbpprop:title
dbpedia-owl:title
select ?title where {
dbpedia:Napoleon dbpprop:title ?title
filter isLiteral(?title)
}
title
================================================
"Emperor of the French"@en
"Protector of the Confederation of the Rhine"@en
"First Consul of France"@en
"Provisional Consul of France"@en
""The Death of Napoleon""@en
"from the Memoirs of Bourrienne, 1831"@en