Question

我有RDD [（Int，Array [Double]）]之后，我调用了一个classFunction

val rdd = spark.sparkContext.parallelize(Seq(
        (1, Array(2.0,5.0,6.3)),
        (5, Array(1.0,3.3,9.5)),
        (1, Array(5.0,4.2,3.1)),
        (2, Array(9.6,6.3,2.3)),
        (1, Array(8.5,2.5,1.2)),
        (5, Array(6.0,2.4,7.8)),
        (2, Array(7.8,9.1,4.2))
      )
    )
 val new_class = new ABC
 new_class.demo(data)

在内部类中，声明全局变量值= 0。在demo（）中，声明了新变量new_value = 0。在map操作之后，new_value会更新并在地图中打印更新后的值。

class ABC extends Serializable {
        var value  = 0
        def demo(data_new : RDD[(Int ,Array[Double])]): Unit ={
            var new_value = 0
            data_new.coalesce(1).map(x => {
                if(x._1 == 1)
                    new_value = new_value + 1
                println(new_value)
                value = new_value
            }).count()
            println("Outside-->" +value)
        }
    }

输出： -

1
1
2
2
3
3
3
Outside-->0

如何在地图操作后更新全局变量值？。

Answer 1

不，您无法从地图内部更改全局变量。

如果您尝试计算函数中的一个数，则可以使用过滤器

val value = data_new.filter(x => (x._1 == 1)).count 
println("Outside-->" +value)

输出：

Outside-->3

此外，不建议使用可变变量var。您应该始终尝试使用不可变的val

我希望这有帮助！

Answer 2

我不确定你在做什么，但你需要使用Accumulators来执行你需要添加这样的值的操作类型。

以下是一个例子：

scala> rdd.mapValues(_ => 1L).reduceByKey(_ + _).take(3)
res41: Array[(Int, Long)] = Array((1,3), (2,2), (5,2))

如@philantrovert所述，如果您希望计算每个密钥的出现次数，您可以执行以下操作：

countByKey

您也可以使用base_url，但要避免使用大数据集。

Answer 3

OR You can do achieve your problem in this way also:
class ABC extends Serializable {
        def demo(data_new : RDD[(Int ,Array[Double])]): Unit ={
            var new_value = 0
            data_new.coalesce(1).map(x => {
                if(x._1 == 1)
                  var key = x._1
             (key, 1)
            }).reduceByKey(_ + _)

        }
     println("Outside-->" +demo(data_new))
    }

如何在RDD映射操作中更新全局变量

3 个答案: