我试图弄清楚如何将我从sql HiveContext检索到的SchemaRDD对象映射到PairRDDFunctions [String,Vector]对象,其中字符串值是schemaRDD中的name列和其余列( BytesIn,BytesOut等...是向量。
答案 0 :(得分:2)
假设你有列:“name”,“bytesIn”,“bytesOut”
val schemaRDD: SchemaRDD = ...
val pairs: RDD[(String, (Long, Long)] =
schemaRDD.select("name", "bytesIn", "bytesOut").rdd.map {
case Row(name, bytesIn, bytesOut) =>
name -> (bytesIn, bytesOut)
}
// To import PairRDDFunctions via implicits
import SparkContext._
pairs.groupByKey ... etc