Spark在嵌套结构上使用reduceByKey

时间:2015-05-06 23:55:40

标签: scala nested apache-spark rdd

目前我有这样的结构: Array[(Int, Array[(String, Int)])],我想在reduceByKey上使用Array[(String, Int)]//data is in Array[(Int, Array[(String, Int)])] structure val result = data.map(l => (l._1, l._2.reduceByKey(_ + _))) 位于元组数组中。我试过像

这样的代码
Array[(String,Int)]

错误告诉import RPi.GPIO as gpio import _mysql gpio.setmode(gpio.BCM) gpio.setwarnings(False) rele_luci=17; fotoresistenza=4; gpio.setup(fotoresistenza,gpio.IN) #fotoresistenza gpio.setup(rele_luci,gpio.OUT) #rele-luci-giardino connessione = _mysql.connect("localhost","residente","pinkrabbits","domotica") comando = "SELECT attivo FROM casa WHERE id_stanza = 1" while 1==1: query = connessione.query(comando) risultato = connessione.store_result() attivo = int(risultato.fetch_row()[0][0]) if (attivo == 0): valore=gpio.input(fotoresistenza) if valore == 0: gpio.output(rele_luci,1) comando2 = "UPDATE casa SET luci=1 WHERE id_stanza=1" else: gpio.output(rele_luci,0) comando2 = "UPDATE casa SET luci=0 WHERE id_stanza=1" query = connessione.query(comando2) 没有名为reduceByKey的方法,我知道这个方法只能在RDD上使用。所以我的问题是,有没有办法使用" reduceByKey" feature,不需要在嵌套结构中使用这种方法吗?

谢谢你们。

1 个答案:

答案 0 :(得分:2)

您只需在此使用Array的{​​{1}}方法,因为您现在正在使用reduce而不是Array(假设您的确意味着外包装是RDD

RDD

最终,您似乎无法理解val data = sc.parallelize(List((1,List(("foo", 1), ("foo", 1))))) data.map(l=>(l._1, l._2.foldLeft(List[(String, Int)]())((accum, curr)=>{ val accumAsMap = accum.toMap accumAsMap.get(curr._1) match { case Some(value : Int) => (accumAsMap + (curr._1 -> (value + curr._2))).toList case None => curr :: accum } }))).collect 是什么,所以您可能想要阅读一些关于它们的文档。