我们将来自oracle的数据加载到数据集中:像这样。
val dataset = sqlContext.read.format("jdbc").options(Map(
"driver" -> applicationConfig.getString("oracle.driver"),
"url" -> applicationConfig.getString("oracle.url"),
"user" -> applicationConfig.getString("oracle.user"),
"password" -> applicationConfig.getString("oracle.password"),
"dbtable" -> query
)).load().as[CaseClass]
CaseClass看起来像:
case class CaseClass (
RELNR: Long = null,
INS_CONTACTHIST_DATE: Timestamp = null,
CONTACTDATETIME: Timestamp = null,
CONTACTSTATUSID: Long = null,
...
我想创建一个新的DataSet [CaseClass]
import sqlContext.implicits._
val acc = sqlContext.createDataset[CaseClass](Seq())
并使用数据集中的过滤数据进行几次迭代:
val possibilities = dataset.filter(c => predicate(c))
acc.union(possibilities)
这失败并出现错误:unresolved operator 'Union;
从SO我了解到这与不兼容的数据集有关,并且在两个数据集上执行printSchema()
确认某些columntypes不兼容:
Oracle:
|-- RELNR: decimal(10,0) (nullable = true)
|-- INS_CONTACTHIST_DATE: date (nullable = true)
|-- CONTACTDATETIME: timestamp (nullable = true)
|-- CONTACTSTATUSID: decimal(19,0) (nullable = true)
empty dataset:
|-- RELNR: long (nullable = true)
|-- INS_CONTACTHIST_DATE: timestamp (nullable = true)
|-- CONTACTDATETIME: timestamp (nullable = true)
|-- CONTACTSTATUSID: long (nullable = true)
如何使工会工作?或者如何通过sqlcontext.read(..)
强制使用CaseClass'物业类型?
答案 0 :(得分:1)
我尝试使用您在问题中提供的empty
创建dataset
case class
case class CaseClass (
RELNR: Long = null,
INS_CONTACTHIST_DATE: Timestamp = null,
CONTACTDATETIME: Timestamp = null,
CONTACTSTATUSID: Long = null
)
我试过
import sqlContext.implicits._
val acc = sqlContext.createDataset[CaseClass](Seq())
acc.printSchema()
但遗憾的是我收到了以下错误
Error:(246, 38) an expression of type Null is ineligible for implicit conversion
RELNR: Long = null,
Error:(246, 38) type mismatch;
found : Null(null)
required: Long
RELNR: Long = null,
Error:(249, 48) an expression of type Null is ineligible for implicit conversion
CONTACTSTATUSID: Long = null
Error:(249, 48) type mismatch;
found : Null(null)
required: Long
CONTACTSTATUSID: Long = null
然后我试了
case class CaseClass (
RELNR: Decimal = null,
INS_CONTACTHIST_DATE: java.sql.Date = null,
CONTACTDATETIME: Timestamp = null,
CONTACTSTATUSID: Decimal = null
)
这适用于以下schema
root
|-- RELNR: decimal(38,18) (nullable = true)
|-- INS_CONTACTHIST_DATE: date (nullable = true)
|-- CONTACTDATETIME: timestamp (nullable = true)
|-- CONTACTSTATUSID: decimal(38,18) (nullable = true)
这与您的Oracle schema
Oracle:
|-- RELNR: decimal(10,0) (nullable = true)
|-- INS_CONTACTHIST_DATE: date (nullable = true)
|-- CONTACTDATETIME: timestamp (nullable = true)
|-- CONTACTSTATUSID: decimal(19,0) (nullable = true)
那么union
应该是可能的。
答案 1 :(得分:0)
这SO answer基本上把它钉死了。我修改了空数据集的创建,如:
sqlContext.createDataset[CaseClass](Seq()).selectExpr(
"ROW_ID",
"cast (RELNR as Decimal(10,0)) RELNR",
...
).as[CaseClass]