以我可以使用RDFlib使用的方式从RDF图中获取事实

时间:2014-03-20 11:44:40

标签: python rdf dbpedia rdflib

我正在努力学习使用RDF,并试图从dbpedia中提取一组事实作为我的学习练习。以下代码示例有点工作,但对于配偶这样的主题,它总是把自己的人拉出来。

问题:

  1. get_name_from_uri()拉出URI的最后一部分并删除下划线 - 必须有更好的方法来获取人名
  2. 配偶拉回配偶的结果,但也拉回数据主题 - 不确定那里有什么
  3. 某些结果会以URI格式和文本项 -
  4. 拉回数据

    这是代码块的输出,显示了我得到的一些奇怪的结果(参见属性中的混合输出,他与自己结婚的事实以及约瑟芬的错误名称?

    Accessing facts for Napoleon  held at  http://dbpedia.org/resource/Napoleon
    
    There are  800  facts about Napoleon stored at the URI
    http://dbpedia.org/resource/Napoleon
    
    Here are a few:-
    Ontology:deathdate
    
    Napoleon died on 1821-05-05
    
    Ontology:birthdate
    Napoleon was born on 1769-08-15
    
    Property:spouse retruns the person themslves twice !
    Napoleon was married to  Marie Louise, Duchess of Parma
    Napoleon was married to  Napoleon
    Napoleon was married to  Jos%C3%A9phine de Beauharnais
    Napoleon was married to  Napoleon
    
    Property:title retruns text and uri's
    Napoleon  Held the title:  "The Death of Napoleon"
    Napoleon  Held the title: http://dbpedia.org/resource/Emperor_of_the_French
    Napoleon  Held the title: http://dbpedia.org/resource/King_of_Italy
    Napoleon  Held the title:  First Consul of France
    Napoleon  Held the title:  Provisional Consul of France
    Napoleon  Held the title:  http://dbpedia.org/resource/Napoleon
    Napoleon  Held the title:  Emperor of the French
    Napoleon  Held the title: http://dbpedia.org/resource/Co-Princes_of_Andorra
    Napoleon  Held the title:  from the Memoirs of Bourrienne, 1831
    Napoleon  Held the title:  Protector of the Confederation of the Rhine
    
    Ontology birth place returns three records
    Napoleon was born in  Ajaccio
    Napoleon was born in  Corsica
    Napoleon was born in  Early modern France
    

    这是产生上面输出的python,它需要rdflib并且正在进行中。

    import rdflib
    from rdflib import Graph, URIRef, RDF
    
    ######################################
    #  A quick test of a python library reflib to get data from an rdf graph
    # D Moore 15/3/2014
    # needs rdflib > version 3.0
    
    # CHANGE THE URI BELOW TO A DIFFERENT PERSON AND SEE WHAT HAPPENS
    # COULD DO WITH A WEB FORM 
    # NOTES:
    #
    #URI_ref = 'http://dbpedia.org/resource/Richard_Nixon'
    #URI_ref = 'http://dbpedia.org/resource/Margaret_Thatcher'
    #URI_ref = 'http://dbpedia.org/resource/Isaac_Newton'
    #URI_ref = 'http://dbpedia.org/resource/Richard_Nixon'
    URI_ref = 'http://dbpedia.org/resource/Napoleon'
    #URI_ref = 'http://dbpedia.org/resource/apple'
    ##########################################################
    
    
    def get_name_from_uri(dbpedia_uri):  
        # pulls the last part of a uri out and removes underscores
        # got to be an easier way but it works
        output_string = ""
        s = dbpedia_uri
        # chop the url into bits devided by the /
        tokens = s.split("/")
        # because the name of our person is in the last section itterate through each token 
        # and replace the underscore with a space
        for i in tokens :
            str = ''.join([i])
            output_string = str.replace('_',' ')
        # returns the name of the person without underscores 
        return(output_string)
    
    def is_person(uri):
    #####  SPARQL way to do this
        uri = URIRef(uri)
        person = URIRef('http://dbpedia.org/ontology/Person')
        g= Graph()
        g.parse(uri)
        resp = g.query(
            "ASK {?uri a ?person}",
            initBindings={'uri': uri, 'person': person}
        )
        print uri, "is a person?", resp.askAnswer
        return resp.askAnswer
    
    URI_NAME = get_name_from_uri(URI_ref)
    NAME_LABEL = ''
    
    if is_person(URI_ref):
        print "Accessing facts for", URI_NAME, " held at ", URI_ref
    
        g = Graph()
        g.parse(URI_ref)
        print "Person Extract for", URI_NAME
        print "There are ",len(g)," facts about", URI_NAME, "stored at the URI ",URI_ref
        print "Here are a few:-"
    
    
        # Ok so lets get some facts for our person
        for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/birthName")):
            print URI_NAME, "was born " + str(stmt[1])
    
        for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/deathDate")):
            print URI_NAME, "died on", str(stmt[1])
    
        for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/birthDate")):
            print URI_NAME, "was born on", str(stmt[1])
    
        for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/eyeColor")):
            print URI_NAME, "had eyes coloured", str(stmt[1])
    
        for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/spouse")):
            print URI_NAME, "was married to ", get_name_from_uri(str(stmt[1]))
    
        for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/reigned")):
            print URI_NAME, "reigned ", get_name_from_uri(str(stmt[1]))
    
        for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/children")):
            print URI_NAME, "had a child called ", get_name_from_uri(str(stmt[1]))
    
        for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/profession")):
            print URI_NAME, "(PROPERTY profession) was trained as a  ", get_name_fro    m_uri(str(stmt[1]))
    
        for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/child")):
            print URI_NAME, "PROPERTY child ", get_name_from_uri(str(stmt[1]))
    
        for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/deathplace")):
            print URI_NAME, "(PROPERTY death place) died at: ", str(stmt[1])
    
        for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/title")):
            print URI_NAME, "(PROPERTY title) Held the title: ", str(stmt[1])
    
    
        for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/sex")):
            print URI_NAME, "was a ", str(stmt[1])
    
        for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/knownfor")):
            print URI_NAME, "was known for ", str(stmt[1])
    
        for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/birthPlace")):
            print URI_NAME, "was born in ", get_name_from_uri(str(stmt[1]))
    
    else:
        print "ERROR - "
        print "Resource", URI_ref, 'does not look to be a person or there is no record in dbpedia'
    

1 个答案:

答案 0 :(得分:2)

获取名字

* get_name_from_uri *正在使用URI。由于DBpedia数据几乎对所有内容都有rdfs:labels,因此最好还是要求rdfs:label并将其用作值。例如,查看此SPARQL查询运行the DBpedia SPARQL endpoint的结果:

select ?spouse ?spouseName where {
  dbpedia:Napoleon dbpedia-owl:spouse ?spouse .
  ?spouse rdfs:label ?spouseName .
  filter( langMatches(lang(?spouseName),"en") )
}
spouse                                                      spouseName
http://dbpedia.org/resource/Jos%C3%A9phine_de_Beauharnais   "Joséphine de Beauharnais"@en
http://dbpedia.org/resource/Marie_Louise,_Duchess_of_Parma  "Marie Louise, Duchess of Parma"@en

意外的配偶

subject_objects的文档说明了

  

subject_objects(self,predicate = None)

     

给定谓词的(主题,对象)元组的生成器

您正确地看到,DBpedia中有四个三元组具有谓词{{​​1}}(顺便说一下,您有没有使用dbpprop:spouse的原因?)并且{ {1}}作为主题或对象:

dbpedia-owl:spouse

对于其中每一个,你都要打印

Napoleon

其中X是三元组的对象。也许您应该使用objects代替:

  

对象(self,subject = None,谓词=无)

     

具有给定主题和谓词的对象生成器

URI与文本(文字)结果

DBpedia本体属性描述的数据(URI以Napoleon spouse Marie Louise, Duchess of Parma Marie Louise, Duchess of Parma spouse Napoleon Napoleon spouse Jos%C3%A9phine de Beauharnais Jos%C3%A9phine de Beauharnais spouse Napoleon 开头,通常缩写为"Napoleon was married to X" 的数据)比DBpedia原始数据属性(URI开头的数据)描述的数据“更清晰”使用http://dbpedia.org/ontology/,通常缩写为dbpedia-owl:)。例如,当您查看标题时,您使用的是属性http://dbpedia.org/property/,并且URI和文字都是值。它看起来不像dbpprop:,所以在这种情况下你只需处理它。尽管如此,过滤掉其中一个很容易:

dbpprop:title
dbpedia-owl:title
select ?title where {
  dbpedia:Napoleon dbpprop:title ?title
  filter isLiteral(?title)
}
title
================================================
"Emperor of the French"@en
"Protector of the Confederation of the Rhine"@en
"First Consul of France"@en
"Provisional Consul of France"@en
""The Death of Napoleon""@en
"from the Memoirs of Bourrienne, 1831"@en