在Apache Beam中,如何在Pipeline-IO级别处理异常/错误

时间:2018-11-01 09:41:54

标签: apache-spark jdbc apache-beam apache-beam-io

我正在将运行中的火花流道作为Apache梁中的管道流道,发现一个错误。 通过得到错误,我的问题引起了注意。我知道错误是由于sql查询中的Column_name不正确,但我的问题是如何在IO级别处理错误/异常

org.apache.beam.sdk.util.UserCodeException: java.sql.SQLSyntaxErrorException: Unknown column 'FIRST_NAME' in 'field list'
at org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:36)
at org.apache.beam.sdk.io.jdbc.JdbcIO$ReadFn$DoFnInvoker.invokeProcessElement(Unknown Source)
at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:185)
at org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:149)
at org.apache.beam.runners.spark.translation.DoFnRunnerWithMetrics.processElement(DoFnRunnerWithMetrics.java:70)
at org.apache.beam.runners.spark.translation.SparkProcessContext$ProcCtxtIterator.computeNext(SparkProcessContext.java:145)
at org.apache.beam.repackaged.beam_runners_spark.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
at org.apache.beam.repackaged.beam_runners_spark.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:216)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1092)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
18/11/01 13:13:16 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3, localhost, executor driver): org.apache.beam.sdk.util.UserCodeException: java.sql.SQLSyntaxErrorException: Unknown column 'FIRST_NAME' in 'field list'
    at org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:36)
    at org.apache.beam.sdk.io.jdbc.JdbcIO$ReadFn$DoFnInvoker.invokeProcessElement(Unknown Source)
    at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:185)
    at org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:149)
    at org.apache.beam.runners.spark.translation.DoFnRunnerWithMetrics.processElement(DoFnRunnerWithMetrics.java:70)
    at org.apache.beam.runners.spark.translation.SparkProcessContext$ProcCtxtIterator.computeNext(SparkProcessContext.java:145)
    at org.apache.beam.repackaged.beam_runners_spark.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
    at org.apache.beam.repackaged.beam_runners_spark.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
    at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    ..............
    at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLSyntaxErrorException: Unknown column 'FIRST_NAME' in 'field list'
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:536)
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:513)
    at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:115)
    at com.mysql.cj.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:1983)
    at com.mysql.cj.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1826)
    at com.mysql.cj.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1923)
    at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
    at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
    at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
    at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
    at org.apache.beam.sdk.io.jdbc.JdbcIO$ReadFn.processElement(JdbcIO.java:601)

1 个答案:

答案 0 :(得分:0)

您必须创建一个 自定义例外 处理程序类以捕获该异常,例如;

需要实现这样的自定义方法

public Mycust_Exception(String string) {
    super("Error Obtained by "+string);
}

在这里,我刚刚返回了字符串,但是也可以使用super()进行抛出,现在您需要在希望有异常的地方声明try-catch块,并遵循PTranformation_level_exceptionHandler_implementation

并在catch块中这样调用throw语句

throw new Ezflow_Exception("Invalid statement");

此实现肯定可以满足您的大部分查询要求。 对于Java编程,这是最常见的实现方式之一