从oracle加载时不兼容的数据集[CaseClass]

时间:2017-05-30 11:48:25

标签: oracle scala apache-spark

我们将来自oracle的数据加载到数据集中:像这样。

val dataset = sqlContext.read.format("jdbc").options(Map(
  "driver" -> applicationConfig.getString("oracle.driver"),
  "url" -> applicationConfig.getString("oracle.url"),
  "user" -> applicationConfig.getString("oracle.user"),
  "password" -> applicationConfig.getString("oracle.password"),
  "dbtable" -> query
)).load().as[CaseClass]

CaseClass看起来像:

case class CaseClass (
  RELNR: Long = null,
  INS_CONTACTHIST_DATE: Timestamp = null,
  CONTACTDATETIME: Timestamp = null,
  CONTACTSTATUSID: Long = null,
  ...

我想创建一个新的DataSet [CaseClass]

import sqlContext.implicits._
val acc = sqlContext.createDataset[CaseClass](Seq())

并使用数据集中的过滤数据进行几次迭代:

 val possibilities = dataset.filter(c => predicate(c))
 acc.union(possibilities)

这失败并出现错误:unresolved operator 'Union;从SO我了解到这与不兼容的数据集有关,并且在两个数据集上执行printSchema()确认某些columntypes不兼容:

Oracle:
|-- RELNR: decimal(10,0) (nullable = true)
|-- INS_CONTACTHIST_DATE: date (nullable = true)
|-- CONTACTDATETIME: timestamp (nullable = true)
|-- CONTACTSTATUSID: decimal(19,0) (nullable = true)

empty dataset:
|-- RELNR: long (nullable = true)
|-- INS_CONTACTHIST_DATE: timestamp (nullable = true)
|-- CONTACTDATETIME: timestamp (nullable = true)
|-- CONTACTSTATUSID: long (nullable = true)

如何使工会工作?或者如何通过sqlcontext.read(..)强制使用CaseClass'物业类型?

2 个答案:

答案 0 :(得分:1)

我尝试使用您在问题中提供的empty创建dataset case class

case class CaseClass (
                       RELNR: Long = null,
                       INS_CONTACTHIST_DATE: Timestamp = null,
                       CONTACTDATETIME: Timestamp = null,
                       CONTACTSTATUSID: Long = null
                     )

我试过

import sqlContext.implicits._
val acc = sqlContext.createDataset[CaseClass](Seq())
acc.printSchema()

但遗憾的是我收到了以下错误

Error:(246, 38) an expression of type Null is ineligible for implicit conversion
                       RELNR: Long = null,
Error:(246, 38) type mismatch;
 found   : Null(null)
 required: Long
                       RELNR: Long = null,
Error:(249, 48) an expression of type Null is ineligible for implicit conversion
                       CONTACTSTATUSID: Long = null
Error:(249, 48) type mismatch;
 found   : Null(null)
 required: Long
                       CONTACTSTATUSID: Long = null

然后我试了

case class CaseClass (
                       RELNR: Decimal = null,
                       INS_CONTACTHIST_DATE: java.sql.Date = null,
                       CONTACTDATETIME: Timestamp = null,
                       CONTACTSTATUSID: Decimal = null
                     )

这适用于以下schema

root
 |-- RELNR: decimal(38,18) (nullable = true)
 |-- INS_CONTACTHIST_DATE: date (nullable = true)
 |-- CONTACTDATETIME: timestamp (nullable = true)
 |-- CONTACTSTATUSID: decimal(38,18) (nullable = true)

这与您的Oracle schema

相匹配
Oracle:
|-- RELNR: decimal(10,0) (nullable = true)
|-- INS_CONTACTHIST_DATE: date (nullable = true)
|-- CONTACTDATETIME: timestamp (nullable = true)
|-- CONTACTSTATUSID: decimal(19,0) (nullable = true)

那么union应该是可能的。

答案 1 :(得分:0)

SO answer基本上把它钉死了。我修改了空数据集的创建,如:

sqlContext.createDataset[CaseClass](Seq()).selectExpr(
  "ROW_ID",
  "cast (RELNR as Decimal(10,0)) RELNR",
  ...
).as[CaseClass]