处理火花读取数据库mysql表抛出异常

时间:2019-08-14 14:34:00

标签: mysql scala apache-spark jdbc amazon-rds-aurora

我有一个Spark作业,该作业从AWS Aurora Mysql数据库读取。不幸的是,由于其中一条记录的日期时间无效,该作业一直未能引发异常。

示例代码:

val jdbcUrl =
      s"jdbc:mysql://$dbHostname:$dbPort/$dbName?zeroDateTimeBehavior=convertToNull&serverTimezone=UTC"

    val props = reportConf.connectionProps(db)

    val df = spark.read
      .format("jdbc")
      .option("url", jdbcUrl)
      .option("dbtable",
              s"(SELECT *, MOD($partitionColumn,10) AS partition_key FROM $table ORDER BY $partitionColumn) as $table") // DESC LIMIT 50000
      .option("user", user)
      .option("password", password)
      .option("driver", driver)
      .option("numPartitions", numPartitions)
      .option("partitionColumn", "partition_key")
      .option("lowerBound", 0)
      .option("upperBound", 9)
      .option("mode", "DROPMALFORMED")
      .load()
      .drop('partition_key)

我已经尝试在我的zeroDateTimeBehavior=convertToNull中将零日期值转换为Null-jdbcUrl,但无法正常工作。

最好,我想跳过记录或替换为一些默认值以供以后过滤,而不是手动识别数据库表中的不良记录。

有什么主意如何解决这个问题吗?

例外:

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:678)
Caused by: java.lang.IllegalArgumentException: DAY_OF_MONTH
    at java.util.GregorianCalendar.computeTime(GregorianCalendar.java:2648)
    at java.util.Calendar.updateTime(Calendar.java:3393)
    at java.util.Calendar.getTimeInMillis(Calendar.java:1782)
    at com.mysql.cj.jdbc.io.JdbcDateValueFactory.createFromDate(JdbcDateValueFactory.java:67)
    at com.mysql.cj.jdbc.io.JdbcDateValueFactory.createFromDate(JdbcDateValueFactory.java:39)
    at com.mysql.cj.core.io.ZeroDateTimeToNullValueFactory.createFromDate(ZeroDateTimeToNullValueFactory.java:41)
    at com.mysql.cj.core.io.BaseDecoratingValueFactory.createFromDate(BaseDecoratingValueFactory.java:46)
    at com.mysql.cj.core.io.BaseDecoratingValueFactory.createFromDate(BaseDecoratingValueFactory.java:46)
    at com.mysql.cj.core.io.MysqlTextValueDecoder.decodeDate(MysqlTextValueDecoder.java:66)
    at com.mysql.cj.mysqla.result.AbstractResultsetRow.decodeAndCreateReturnValue(AbstractResultsetRow.java:70)
    at com.mysql.cj.mysqla.result.AbstractResultsetRow.getValueFromBytes(AbstractResultsetRow.java:225)
    at com.mysql.cj.mysqla.result.TextBufferRow.getValue(TextBufferRow.java:122)
    at com.mysql.cj.jdbc.result.ResultSetImpl.getNonStringValueFromRow(ResultSetImpl.java:630)
    at com.mysql.cj.jdbc.result.ResultSetImpl.getDateOrTimestampValueFromRow(ResultSetImpl.java:643)
    at com.mysql.cj.jdbc.result.ResultSetImpl.getDate(ResultSetImpl.java:788)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$2.apply(JdbcUtils.scala:389)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$2.apply(JdbcUtils.scala:387)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:356)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:338)
    at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at org.apache.spark.util.CompletionIterator.hasNex

0 个答案:

没有答案