数据帧中的无穷大值spark / scala

时间:2020-03-29 19:23:26

标签: scala apache-spark

我有一个无穷大的数据框。如何将其替换为0.0。

我尝试过但是没用。

val Nan= dataframe_final.withColumn("Vitesse",when(col("Vitesse").isin(Double.NaN,Double.PositiveInfinity,Double.NegativeInfinity),0.0))

数据框示例

--------------------
|    Vitesse       |
--------------------
| 8.171069002316942|
|  Infinity        |
| 4.290418664272539|
|16.19811830014666 |
|                  |

如何替换“ Infinity by 0.0”?

谢谢。

2 个答案:

答案 0 :(得分:1)

scala> df.withColumn("Vitesse", when(col("Vitesse").equalTo(Double.PositiveInfinity),0.0).otherwise(col("Vitesse")))
res1: org.apache.spark.sql.DataFrame = [Vitesse: double]

scala> res1.show
+-----------------+
|          Vitesse|
+-----------------+
|8.171069002316942|
|              0.0|
|4.290418664272539|
+-----------------+

您可以尝试以上操作。

答案 1 :(得分:0)

使用 when() .otherwise()

您的方法是正确的
  • 添加缺少的otherWise语句以按原样获取Vitesse值,如果value不在 Infinity,-Infinity,NaN 中。

Example:

val df=Seq(("8.171".toDouble),("4.2904".toDouble),("16.19".toDouble),(Double.PositiveInfinity),(Double.NegativeInfinity),(Double.NaN)).toDF("Vitesse")
df.show()
//+---------+
//|  Vitesse|
//+---------+
//|    8.171|
//|   4.2904|
//|    16.19|
//| Infinity|
//|-Infinity|
//|      NaN|
//+---------+

df.withColumn("Vitesse", when(col("Vitesse").isin(Double.PositiveInfinity,Double.NegativeInfinity,Double.NaN),0.0).
otherwise(col("Vitesse"))).
show()
//+-------+
//|Vitesse|
//+-------+
//|  8.171|
//| 4.2904|
//|  16.19|
//|    0.0|
//|    0.0|
//|    0.0|
//+-------+