我正在尝试将json读入数据集(spark 2.1.1)。不幸的是它不起作用。并失败了:
DECLARE @table1 AS TABLE (EmpID int, ProcessID varchar(1), ID int)
INSERT INTO @table1
VALUES (2, 'B', 1),
(1, 'A', 2),
(3, 'C', 3);
DECLARE @table2 AS TABLE (EmpID int, ProcessID varchar(1), ID int)
INSERT INTO @table2
VALUES (1, 'F', 1),
(2, 'E', 2);
WITH united AS
(
SELECT EmpID, ProcessID, ID, 1 AS tableNum
FROM @table1
UNION
SELECT EmpID, ProcessID, (SELECT t1.ID FROM @table1 t1 WHERE t1.EmpID = t2.EmpID) AS ID, 2 AS tableNum
FROM @table2 t2
)
SELECT EmpId, ProcessID
FROM united
WHERE ID IS NOT NULL
ORDER BY ID, tableNum;
任何想法我做错了什么?
EmpID ProcessID
----------- ---------
2 B
2 E
1 A
1 F
3 C
答案 0 :(得分:2)
通常情况下,如果某个字段丢失,请使用Option
:
case class Owner(id: String, pets: Seq[Pet])
case class Pet(name: String, age: Option[Long])
或nullable
类型:
case class Owner(id: String, pets: Seq[Pet])
case class Pet(name: String, age: java.lang.Long)
但这个确实看起来像个错误。我在Spark 2.2中对此进行了测试,现在已经解决了。我认为快速解决方法是保持字段按名称排序:
case class Owner(id: String, pets: Seq[Pet])
case class Pet(age: java.lang.Long, name: String)