RDFlib与Fuseki在本地服务器上运行查询的速度非常慢

时间:2015-01-14 09:31:58

标签: python sparql rdflib fuseki

我有一个小的wsgi应用程序,运行本地Cherry服务器,我使用RDFlib将人类语言查询转换为SPARQL查询,以查询加载到fuseki的ttl文件。它有效,但速度非常慢!该脚本的先前版本直接接受了SPARQL查询,因此我没有使用RDFlib,它工作得非常快!我使用RDFlib的方式是否有问题导致它如此慢?

from abc import ABCMeta, abstractmethod, abstractproperty
from collections import OrderedDict
from threading import Thread
from time import sleep
from cherrypy import engine
from cherrypy.wsgiserver import CherryPyWSGIServer
from werkzeug.wsgi import DispatcherMiddleware
from werkzeug.debug import DebuggedApplication
from werkzeug.wsgi import SharedDataMiddleware
from werkzeug.wrappers import Response, Request
from requests import get, post, RequestException
from jinja2 import Environment, FileSystemLoader
import os
from SPARQLWrapper import SPARQLWrapper, JSON
import rdflib


__author__ = 'authorname'

templates_dir = os.path.abspath('templates')
static_dir = os.path.abspath('static')

class RdfDemoApp(object):
    def __init__(self, sparql_endpoint_address):
        self._sparql_endpoint_address = sparql_endpoint_address
        self._jinja_env = Environment(loader=FileSystemLoader(templates_dir), autoescape=True)

    def render_template(self, template, **params):
        t = self._jinja_env.get_template(template)
        return t.render(params)

    def _app(self, environ, start_response):
        request = Request(environ)
        if 'query' in request.args.keys():
            query_string = "'"+request.args['query']+"'"
            print query_string

            results = g.query("""PREFIX pers: <http://blabla.com/Register/schemas/persons/>
            SELECT ?person ?sibling ?sibforname
            WHERE { 
               ?person pers:name ?name .
                 ?name  pers:forename """+query_string+""" .
                ?person pers:siblingOf ?sibling .
                ?sibling pers:name ?sibname .
                ?sibname pers:forename ?sibforname .
                 ?sibname pers:type "std"  } """)#, format = "JSON")
            for row in results:
                print row
            header = []
            i=0
            for item in results:
                while i in range(len(item)):
                    for x in item:
                        header.append(x)
                        i+=1
            quer = query_string
            response = Response(self.render_template('results_rdf.html', results=results, header = header, query = quer, static_dir = static_dir ), mimetype='text/html')

        else:
            response = Response(self.render_template('form_rdf.html'), mimetype='text/html')

        return response(environ, start_response)

    def __call__(self, environ, start_response):
        app = SharedDataMiddleware(DebuggedApplication(self._app, evalex=True), {
            '/static': static_dir
        })
        return app(environ, start_response)

if __name__ == '__main__':
    g=rdflib.Graph()
    g.parse("/Users/username/Documents/pers_file_new.ttl", format='n3')
    wsgi_app = RdfDemoApp("http://localhost:3030/ds/query")
    try:
        server = CherryPyWSGIServer(('127.0.0.1',10001), wsgi_app)
        server.start()

    except KeyboardInterrupt:
        server.stop()
        print "Logged out"

1 个答案:

答案 0 :(得分:1)

我不确定pers_file_new.ttl中的数据集有多大,但它缓慢的一个原因是你使用RDFLib将它全部读入内存。

g.parse("/Users/username/Documents/pers_file_new.ttl", format='n3')

使用您当前的代码,您不会查询Fuseki,而是查询内存RDFLib图。您可以按照SPARQLWrapper主页中的示例进行操作。它非常接近你想要做的事情。

http://rdflib.github.io/sparqlwrapper/