Jena TDB,看看在tdb创建期间存储了多少三倍

时间:2014-07-27 19:45:37

标签: jena tdb

您可以在使用java api创建tdb期间查看存储的三倍数吗? 我在turtle中使用rar文件运行TDB工厂,但是在我的目录中创建文件时,我无法看到它存储了多少三倍。我该如何解决这个问题?

1 个答案:

答案 0 :(得分:0)

您可以通过java代码访问批量加载器(以查看引入的三元组),如下所示:

final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
try( final InputStream in = /*get input stream for your large file*/) {
    TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , in, true);
}

如果你的档案中有多个文件(为简单起见,我不会做rar,而是拉链),然后是as per an answer to this question,你可以通过将文件连接到一个文件之前获得优化的性能。将它们传递给批量加载器。改进的性能源于延迟索引创建直到引入所有三元组。我确定还有其他支持的格式,但我只测试了N-TRIPLES

以下示例使用IOUtils中的commons-io来复制流:

final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
final PipedOutputStream concatOut = new PipedOutputStream();
final PipedInputStream concatIn = new PipedInputStream(concatOut);

final ExecutorService workers = Executors.newFixedThreadPool(2);
final Future<Long> submitter = workers.submit(new Callable<Long>(){
    @Override
    public Long call() throws Exception {
        long filesLoaded = 0;
        try( final ZipFile zipFile = new ZipFile( /* Archive Location */ ) {
            final Enumeration< ? extends ZipEntry> zipEntries = zipFile.entries();
            while( zipEntries.hasMoreElements() ) {
                final ZipEntry entry = zipEntries.nextElement();
                try( final InputStream singleIn = zipFile.getInputStream(entry) ) {
                    // If your file is in a supported format already
                    IOUtils.copy(singleIn, concatOut); 
                    /*(final Model m = ModelFactory.createDefaultModel();
                    m.read(singleIn, null, "lang");
                    m.write(concatOut, "N-TRIPLES");*/
                }
                filesLoaded++;
            }
        }
        concatOut.close();
        return filesLoaded;
    }});

final Future<Void> comitter = workers.submit(new Callable<Void>(){
    @Override
    public Void call() throws Exception {
        TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , concatIn, true);
        return null;
    }});

workers.shutdown();
System.out.println("submitted "+submitter.get()+" input files for processing");
comitter.get();
System.out.println("completed processing");
workers.awaitTermination(1, TimeUnit.SECONDS); // NOTE this wait is redundant