您可以在使用java api创建tdb期间查看存储的三倍数吗? 我在turtle中使用rar文件运行TDB工厂,但是在我的目录中创建文件时,我无法看到它存储了多少三倍。我该如何解决这个问题?
答案 0 :(得分:0)
您可以通过java代码访问批量加载器(以查看引入的三元组),如下所示:
final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
try( final InputStream in = /*get input stream for your large file*/) {
TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , in, true);
}
如果你的档案中有多个文件(为简单起见,我不会做rar,而是拉链),然后是as per an answer to this question,你可以通过将文件连接到一个文件之前获得优化的性能。将它们传递给批量加载器。改进的性能源于延迟索引创建直到引入所有三元组。我确定还有其他支持的格式,但我只测试了N-TRIPLES
。
以下示例使用IOUtils
中的commons-io
来复制流:
final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
final PipedOutputStream concatOut = new PipedOutputStream();
final PipedInputStream concatIn = new PipedInputStream(concatOut);
final ExecutorService workers = Executors.newFixedThreadPool(2);
final Future<Long> submitter = workers.submit(new Callable<Long>(){
@Override
public Long call() throws Exception {
long filesLoaded = 0;
try( final ZipFile zipFile = new ZipFile( /* Archive Location */ ) {
final Enumeration< ? extends ZipEntry> zipEntries = zipFile.entries();
while( zipEntries.hasMoreElements() ) {
final ZipEntry entry = zipEntries.nextElement();
try( final InputStream singleIn = zipFile.getInputStream(entry) ) {
// If your file is in a supported format already
IOUtils.copy(singleIn, concatOut);
/*(final Model m = ModelFactory.createDefaultModel();
m.read(singleIn, null, "lang");
m.write(concatOut, "N-TRIPLES");*/
}
filesLoaded++;
}
}
concatOut.close();
return filesLoaded;
}});
final Future<Void> comitter = workers.submit(new Callable<Void>(){
@Override
public Void call() throws Exception {
TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , concatIn, true);
return null;
}});
workers.shutdown();
System.out.println("submitted "+submitter.get()+" input files for processing");
comitter.get();
System.out.println("completed processing");
workers.awaitTermination(1, TimeUnit.SECONDS); // NOTE this wait is redundant