我确实使用Sesame(RDF4j)API编写了一个Java应用程序来测试> 700 SPARQL端点的可用性,但这需要几个小时才能完成,所以我尝试使用Hadoop / MapReduce框架来分发这个应用程序。
现在的问题是,在mapper类中,应该测试可用性的方法不起作用,我认为无法连接到端点。
这里是我使用的代码:
public class DMap extends Mapper<LongWritable, Text, Text, Text> {
protected boolean isActive(String sourceURL)
throws RepositoryException, MalformedQueryException, QueryEvaluationException {
boolean t = true;
SPARQLRepository repo = new SPARQLRepository(sourceURL);
repo.initialize();
RepositoryConnection con = repo.getConnection();
TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SPARQL, "SELECT * WHERE{ ?s ?p ?o . } LIMIT 1");
tupleQuery.setMaxExecutionTime(120);
TupleQueryResult result = tupleQuery.evaluate();
if (!result.hasNext()) {
t = false;
}
con.close();
result.close();
repo.shutDown();
return t;
}
public void map(LongWritable key, Text value, Context context) throws InterruptedException, IOException {
String src = value.toString();
String val = "null";
try {
boolean b = isActive(src);
if (b) {
val = "active";
} else {
val = "inactive";
}
} catch (MalformedQueryException e) {
e.printStackTrace();
} catch (RepositoryException e) {
e.printStackTrace();
} catch (QueryEvaluationException e) {
e.printStackTrace();
}
context.write(new Text(src), new Text(val));
}
}
输入是TextInputFormat,它看起来像这样:
http://visualdataweb.infor.uva.es/sparql
......
输出是TextOutputFormat,我得到了这个:
http://visualdataweb.infor.uva.es/sparql null
......
Edit1 :根据@ james-leigh和@ChristophE的建议,我使用了try-with-resource语句,但还没有结果:
public class DMap extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text value, Context context) throws InterruptedException, IOException {
String src = value.toString(), val = "";
SPARQLRepository repo = new SPARQLRepository(src);
repo.initialize();
try (RepositoryConnection con = repo.getConnection()) {
TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SPARQL, "SELECT * WHERE { ?s ?p ?o . } LIMIT 1");
tupleQuery.setMaxExecutionTime(120);
try (TupleQueryResult result = tupleQuery.evaluate()) {
if (!result.hasNext()) {
val = "inactive";
} else {
val = "active";
}
}
}
repo.shutDown();
context.write(new Text(src), new Text(val));
}
}
由于
答案 0 :(得分:1)
使用try-with-resource语句。 SPRAQLRepository使用必须正确清理的后台线程。