如何将SchemaRDD映射到PairRDD

时间:2015-10-01 19:57:54

标签: scala apache-spark apache-spark-sql

我试图弄清楚如何将我从sql HiveContext检索到的SchemaRDD对象映射到PairRDDFunctions [String,Vector]对象,其中字符串值是schemaRDD中的name列和其余列( BytesIn,BytesOut等...是向量。

1 个答案:

答案 0 :(得分:2)

假设你有列:“name”,“bytesIn”,“bytesOut”

val schemaRDD: SchemaRDD = ...
val pairs: RDD[(String, (Long, Long)] = 
  schemaRDD.select("name", "bytesIn", "bytesOut").rdd.map { 
     case Row(name, bytesIn, bytesOut) => 
       name -> (bytesIn, bytesOut)
  }

// To import PairRDDFunctions via implicits
import SparkContext._

pairs.groupByKey ... etc