Spark-删除两个数组类型列之间的相交元素

时间:2018-08-09 14:08:01

标签: scala apache-spark dataframe

我有这样的数据框

+---------+--------------------+----------------------------+
|     Name|                rem1|        quota               |
+---------+--------------------+----------------------------+
|Customer_3|[258, 259, 260, 2...|[1, 2, 3, 4, 5, 6, 7,..500]|
|Customer_4|[18, 19, 20, 27, ...|[1, 2, 3, 4, 5, 6, 7,..500]|
|Customer_5|[16, 17, 51, 52, ...|[1, 2, 3, 4, 5, 6, 7,..500]|
|Customer_6|[6, 7, 8, 9, 10, ...|[1, 2, 3, 4, 5, 6, 7,..500]|
|Customer_7|[0, 30, 31, 32, 3...|[1, 2, 3, 4, 5, 6, 7,..500]|

我想从配额中删除rem1中的列表值,并将其创建为一个新列。我已经尝试过了。

val dfleft = dfpci_remove2.withColumn("left",$"quota".filter($"rem1"))

<console>:123: error: value filter is not a member of org.apache.spark.sql.ColumnName

请告知。

1 个答案:

答案 0 :(得分:1)

您可以以这种方式在列中使用 ID Date Lat Lon Val St Rec 0 1 2017/10/10 70.1 30.4 10 1 1 1 1 2017/10/10 70.1 31.4 20 2 1 2 1 2017/10/10 70.1 31.4 10 2 2 3 1 2017/10/10 70.1 31.4 10 2 3 4 1 2017/10/12 70.1 31.4 20 3 1 5 2 2017/12/10 70.1 30.4 20 1 1 6 2 2017/12/10 70.1 31.4 20 2 1 ,如下所示编写filter

udf

这应该给您预期的结果。

希望这会有所帮助!