从BNode中提取数据

时间:2014-01-02 18:12:33

标签: python rdf rdflib

我正在使用SPARQL从RDF文件中提取节点,rdf文件中的节点如下:

 <dc:description>Birds are a class of vertebrates. They are bipedal, warm-blooded, have a covering of feathers, and their front limbs are modified into wings. Some birds, such as penguins and ostriches, have lost the power of flight. All birds lay eggs. Because birds are warm-blooded, their eggs have to be incubated to keep the embryos inside warm, or they will perish.^M
    <br />
    <br />
    <a href="/nature/19700707">All you need to know about British birds.</a>
</dc:description>

我正在使用python RDFLib来获取此节点。它正在返回

rdflib.term.BNode('Nfc3f01b2567a4b3ea23dbd01394929df')

如何从dc:description rdflib.term.BNode('Nfc3f01b2567a4b3ea23dbd01394929df')

中提取文本

我根据答案尝试的东西:

from rdflib import *
import rdfextras
import json

#load the ontology
rdfextras.registerplugins()
g=Graph()

g.parse("http://www.bbc.co.uk/nature/life/Bird.rdf")


#define the predixes
PREFIX = ''' PREFIX dc:<http://purl.org/dc/terms/>
             .......
             PREFIX po:<http://purl.org/ontology/po/>
             PREFIX owl:<http://www.w3.org/2002/07/owl#>
         '''

def exe(query):
        query = PREFIX + query
        return g.query(query)

def getEntity(entity_type,entity):
        #getting the description
        entity_url = "<http://www.bbc.co.uk/nature/life/" + entity.capitalize() + "#" + entity_type.lower() +">"
    query = ''' SELECT ?description
                    WHERE { ''' + entity_url + ''' dc:description ?description . }'''
    result_set = exe(query)
    dc = Namespace("http://purl.org/dc/terms/")
        for row in result_set:
                description = row[0]
            print description.value(dc.description)

getEntity("class","bird")

我收到以下错误:

Traceback (most recent call last):
  File "test_bird1.py", line 40, in <module>
    getEntity("class","bird")
  File "test_bird1.py", line 38, in getEntity
    print description.value(dc.description)
AttributeError: 'BNode' object has no attribute 'value'

2 个答案:

答案 0 :(得分:1)

BNodes(以及URIrefs)也是资源,因此resource module documentation可能是最适合您的文档。基于该文档,看起来这样的事情应该为您处理事情。其中x是空白节点,而g是图表,它看起来像这样:

>>> from rdflib import *
>>> DC = Namespace("http://purl.org/dc/terms/")
>>> r = Resource( g, x )
>>> r.value(DC.description)

正如this answer中针对您提出的另一个问题SPARQL not returning correct result所指出的那样,让那些<br />出现在他们身上并不合法(也许您需要与另一个序列化,例如,NTriples,N3,Turtle),因此很难预测不同的库将对格式错误的输入做些什么。您可能会让内容制作人知道他们正在发布格式错误的数据。

答案 1 :(得分:0)

from rdflib import Graph, BNode
g = Graph()
g.parse("http://www.bbc.co.uk/nature/life/Bird.rdf")

for objects in g.objects(subject=BNode(add the BNode code here)):
   print (objects)