Question

我有一个格式为

的数据集

scala> rxClaimsUpdated.take(1)
res0: Array[(String, Array[String])] = Array((186037020,Array(
    22960551, 
    hfeu0ysji96afjdicbmqbheop0zsbfuvs4ongjb6yqg=,
    095aa9d791b7b0b0f7f312435b8e30f1, 
    2016-10-15, 
    2015-02-13, 
    00186037020, 
    10, 
    30,  
    "",  
    20)))

对于内部数组我想要更新第9个元素（最后一个），如果它的值为0.（在给定的样本值中为20）。

我试过的代码是错误的

val rxClaimsUpdatedtemp = rxClaimsUpdated.map(z => 
    if(z._2(9).toInt == 0) z._2.updated(9,1) 
    else z._2(9)
)

在

下面找到我的错误

<console>:55: error: Unable to find encoder for type stored in a
Dataset.  Primitive types (Int, String, etc) and Product types 
(case classes) are supported by importing spark.implicits._  
Support for serializing other types will be added in future releases.

       val rxClaimsUpdatedtemp = rxClaimsUpdated.map(z => if(z._2(9).toInt == 0) z._2.updated(9,1) else z._2(9))
                                                    ^

Answer 1

您正尝试使用Array[String]更新Integer，因此会抛出错误。

这是你可以做的事情

rxClaimsUpdatedtemp.map(z => {
  if (z._2(9).toInt == 0) { //check of zero 
    z._2.update(9, "1")
    z // update with above code and return the array
  }
  else z   //return default array
})

希望这有帮助！

Answer 2

上面的Shankar Koirala正确指出了该错误，您正在尝试使用Int更新Array [String]的元素。相同解决方案的另一种方法：

val rxClaimsUpdatedtemp = rxClaimsUpdated.map { elem =>
    (
        elem._1, 
        elem._2.take(elem._2.length-1) ++ {if (elem._2.last == 0) Array("1") else Array()}}
    )
}

在这里，rxClaimsUpdatedtemp的类型将与rsClaimsUpdated相同，因为在这里我们保留元组的第一个元素，同时更新第二个元素。

更新第二个元素的逻辑：从大小为n的数组中获取n-1个元素，并在检查数组的最后一个元素之后追加空数组Array（）或Array（“ 1”）。

根据spark scala中的条件更新数据集内部数组的元素

2 个答案: