当我在具有自定义操作的节点上执行JcrExportCommand时,我想为其过滤掉木兰中JcrExportCommand的“ mgnl:page”节点。
我在下面的代码中编写的过滤器不起作用。仍然在导出文件中为我提供了mgnl:page子节点。
//set filter to only export mgnl:area subnodes
DefaultFilter filter = new JcrExportCommand.DefaultFilter();
NodeFilteringPredicate nodePredicate = new NodeFilteringPredicate();
nodePredicate.setNodeTypes(Lists.newArrayList("mgnl:area"));
filter.setNodePredicate(nodePredicate);
如何设置正确的过滤器以导出除“ mgnl:page”子节点以外的所有内容?我相信将NodeFilteringPredicate设置为“ mgnl:area”只会得到那种类型的节点。
答案 0 :(得分:1)
您必须在JcrExportCommand
上设置过滤器才能生效:
DefaultFilter filter = new DefaultFilter();
filter.getNodePredicate().getNodeTypes().add("mgnl:page");
jcrExport.setFilter(Collections.singletonMap("website", filter));
答案 1 :(得分:0)
*这不是我的问题的答案,而是注释的答案,因为代码的注释格式不正确*
按照@michid的建议,我创建了一个自定义谓词,并使用JcrExportCommand.DefaultFilter#setNodePredicate()对其进行了应用。
我希望根据谓词获得带有已过滤节点的导出YAML,但我仍然会获得所有节点(包括mgnl:page类型的子节点)。
我的自定义谓词类是:
19/12/04 13:21:15 INFO Client: Application report for application_1574748298453_0917 (state: FINISHED)
19/12/04 13:21:15 INFO Client:
client token: N/A
diagnostics: User class threw exception: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://bigdatalab/user/username/output/outfile already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)
at org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.assertConf(SparkHadoopWriter.scala:283)
at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:71)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1096)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1067)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1032)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:958)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:958)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:958)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:957)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1493)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1472)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1472)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1472)
at com.read.scispark.ReadSci$.main(ReadSci.scala:126)
at com.read.scispark.ReadSci.main(ReadSci.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)
我的自定义Action类是:
public class MyPredicate extends NodeFilteringPredicate {
public boolean evaluate(Node node) throws AccessDeniedException, ItemNotFoundException, RepositoryException {
//only nodes that are not of type mgnl:page
if((node.getParent().getPrimaryNodeType().getName().contains("mgnl:page"))&&(node.getPrimaryNodeType().getName().contains("mgnl:page"))) {
return false;
}else{
return true;
}
}
}