当我使用Spark2.0 Dataset和DataFrame读取数据时,代码为:
def func(docs: DataFrame): RDD[(String, String)] = {
docs.select("id", "title").map{
case Row(id: String, title: String) => (id, title)
}}
但编译错误是
missing parameter type for expanded function The argument types of an anonymous function must be fully known. (SLS 8.5) Expected type was: ?
我也试试
def func(docs: DataFrame): RDD[(String, String)] = {
docs.select("id", "title").map{val => val match{
case Row(id: String, title: String) => (id, title)
}}}
但这不起作用! ' val'上的错误是:
◾missing parameter type
我该如何解决这个问题呢?
答案 0 :(得分:4)
您可以使用DataFrame.rdd
解决此问题 // before map
def func(docs: DataFrame): RDD[(String, String)] = {
docs.select("id", "title").rdd.map{
case Row(id: String, title: String) => (id, title)
}}
// or after map
def func(docs: DataFrame): RDD[(String, String)] = {
docs.select("_1", "_2").map{
case Row(id: String, title: String) => (id, title)
}.rdd
}
请注意,在Spark 1.x中,DataFrame.map返回RDD [R],并将函数作为(Row)=> R上。
在Spark 2.x中,Dataset.map返回数据集[U],并将函数作为(T)=>你好。