格式化spark rdd中的(K,(v,w))对

时间:2015-08-19 12:37:58

标签: scala apache-spark

我有这样的rdd:

val custFile = sc.textFile("custInfo.txt").map(line => line.split('|'))

val custPrd = custFile.map(a => (a(0), ((a(1)), (a(2), a(3), a(4), a(5), a(6), a(7), a(8)))))

val custGrp = custPrd.groupByKey

custGrp.saveAsTextFile("custinfo2")

产生这个:

(1104,CompactBuffer((S_SAVG,(1,1,1,1,1,1,1)), (CN_SAVG,(4,4,1,1,4,1,1))))

我怎么能用这样的东西:

custPrdGrp.map{case (k, vals) => {val valsString = vals.mkString(", "); s"{$k:, {$valsString}}" }}

格式化(k,(v,w))对...我尝试了这个但是出了错误:

val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})
<console>:27: error: constructor cannot be instantiated to expected type;
 found   : (T1, T2)
 required: Iterable[(String, (String, String, String, String, String, String, String))]
       val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})
                                                 ^ 


<console>:27: error: not found: value v
           val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})
                                                                                  ^
    <console>:27: error: not found: value w
           val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})

我希望数组看起来像这样:

('1104'|{'S_SAVG': {a: '1', b: '1', c: '1', d: '1', e: '1', f: '1', g: '1'}, 'CN_SAVG': {a: '4', b: '4', c: '1', d: '1', e: '4', f: '1', g: '1'}})

1 个答案:

答案 0 :(得分:3)

嗯,这里有很多细节,但是这样的事情应该有效:

val keys = List("a", "b", "c", "d", "e", "f", "g")

custGrp.map{case (k, vals) => {
    val valsString = vals map {
        case (val1, val2) => {
            val pairs = keys
                // Create someLetter: 'someNumber' pairs
                .zip(val2.productIterator.map{case (x: String)  => x}.toSeq)
                .map{case (k, v) => s"$k: '$v'"}
                // Join into a single string
                .mkString(", ")
            // Add "key"
            s"'$val1': {$pairs}"
        }
    }
    // Combine above
    val valsComb = valsString.mkString(", ")
    // Create final string
    s"('$k'|{$valsComb})"
}}

您可以通过首先创建正确的数据结构来简化操作。例如,使用Maps而不是元组:

 Map("S_SAVG" -> Map("a" -> "1", "b" -> "1", ...), ...)