我正在尝试学习spark数据集(spark 2.0.1)。左外连接下面是创建Null指针异常。
case class Employee(name: String, age: Int, departmentId: Int, salary: Double)
case class Department(id: Int, depname: String)
case class Record(name: String, age: Int, salary: Double, departmentId: Int, departmentName: String)
val employeeDataSet = sc.parallelize(Seq(Employee("Jax", 22, 5, 100000.0),Employee("Max", 22, 1, 100000.0))).toDS()
val departmentDataSet = sc.parallelize(Seq(Department(1, "Engineering"), Department(2, "Marketing"))).toDS()
val averageSalaryDataset = employeeDataset.joinWith(departmentDataSet, $"departmentId" === $"id", "left_outer")
.map(record => Record(record._1.name, record._1.age, record._1.salary, record._1.departmentId , record._2.depname))
averageSalaryDataset.show()
16/12/14 16:48:26错误执行者:阶段2.0(TID 12)中任务0.0中的异常 显示java.lang.NullPointerException
这是因为在执行左外连接时,它为record._2.depname提供了空值。
如何处理?感谢
答案 0 :(得分:1)
使用---
解决了这个问题val averageSalaryDataset1 = employeeDataSet.joinWith(departmentDataSet, $"departmentId" === $"id", "left_outer").selectExpr("nvl(_1.name, ' ') as name","nvl(_1.age, 0) as age","nvl(_1.salary, 0.0D) as salary","nvl(_1.departmentId, 0) as departmentId","nvl(_2.depname, ' ') as departmentName").as[Record]
averageSalaryDataset1.show()
答案 1 :(得分:0)
null。
val averageSalaryDataset = employeeDataSet.joinWith(departmentDataSet, $"departmentId" === $"id", "left_outer").map(record => Record(record._1.name, record._1.age, record._1.salary, record._1.departmentId , if (record._2 == null) null else record._2.depname ))
在连接操作之后,生成的数据集列存储为Map(键值对),在map操作中,我们调用键,但是当您调用record._2.depName时,键为“null”。这是例外的原因。
val averageSalaryDataset = employeeDataSet.joinWith(departmentDataSet, $"departmentId" === $"id", "left_outer")