在Spark中的过滤器内映射

时间:2018-10-23 15:45:23

标签: scala apache-spark

如何在映射中进行过滤?

示例:

test1 = sc.parallelize(Array(('a', (1,Some(4)), ('b', (2, Some(5)), \
('c', (3,Some(6)), ('d',(0,None))))

我想要什么:

Array(('a', (1,Some(4)), ('b', (2, Some(5)), \ ('c', (3,Some(6)), \ 
('d',(613,None))))

我尝试过的操作(我将0更改为613):

test 2 = test1.filter(value => value._2._1 == 0).mapValues(value => 
(613, value._2))

但它仅返回:

Array('d',(613,None))

2 个答案:

答案 0 :(得分:3)

使用map进行模式匹配:

test1.map { 
    case (x, (0, y)) => (x, (613, y)) 
    case z => z 
}.collect
// res2: Array[(Char, (Int, Option[Int]))] = Array((a,(1,Some(4))), (b,(2,Some(5))), (c,(3,Some(6))), (d,(613,None)))

答案 1 :(得分:0)

test1.map{
  case (a, (0, b)) => (a, (613, b))
  case other => other
}