Question

我有以下DataFrame df

+-----------+-----------+-----------+
|CommunityId|nodes_count|edges_count|
+-----------+-----------+-----------+
|         26|          3|         11|
|        964|         16|         18|
|       1806|          9|         31|
|       2040|         13|         12|
|       2214|          8|          8|
|       2927|          7|          7|

然后我将列Rate如下添加：

df
  .withColumn("Rate",when(col("nodes_count") =!= 0, (lit("edges_count")/lit("nodes_count")).as[Double]).otherwise(0.0))

这就是我得到的：

+-----------+-----------+-----------+-----------------------+
|CommunityId|nodes_count|edges_count|                   Rate|
+-----------+-----------+-----------+-----------------------+
|         26|          3|         11|                   null|
|        964|         16|         18|                   null|
|       1806|          9|         31|                   null|
|       2040|         13|         12|                   null|
|       2214|          8|          8|                   null|
|       2927|          7|          7|                   null|

由于某些原因，Rate始终等于null。

Answer 1

发生这种情况是因为您使用了lit。您应该改用col：

df
  .withColumn(
    "Rate" ,when(col("nodes_count") =!= 0,
    (col("edges_count") / col("nodes_count")).as[Double]).otherwise(0.0))

尽管when和Double在这里都没有用，简单的划分就足够了：

df.withColumn("Rate", col("edges_count") / col("nodes_count"))

新列接收值Null

1 个答案: