Spark Scala计算Map Key中字符串数组的出现次数

时间:2017-07-19 22:06:08

标签: scala apache-spark apache-spark-sql

目前,我的数据框有两个字段,名称为

srand(time(NULL));

double rands1 = rand() % (4) + ((rand() % (10)) / 10.0);

我想创建另一个名称率的列,即id1中id的数量百分比显示为地图的键

似乎我无法在udf中安装for循环,想知道我该怎么做?

1 个答案:

答案 0 :(得分:1)

使用Seq.countMap.isDefinedAt检查地图中现有的密钥数量,然后使用udf将其包裹起来:

val df = Seq((Seq("a", "b", "c"), Map("a" -> ("x", 1L, 2L), "x" -> ("y", 2L,2L)))).toDF("id1", "id2")

type CustMap = Map[String, (String, Long, Long)]

def percent_in = udf(
    (id1: Seq[String], id2: CustMap) => id1.count(id2.isDefinedAt)/id1.length.toDouble
)

df.withColumn("rate", percent_in($"id1", $"id2")).show
+---------+--------------------+------------------+
|      id1|                 id2|              rate|
+---------+--------------------+------------------+
|[a, b, c]|Map(a -> [x,1,2],...|0.3333333333333333|
+---------+--------------------+------------------+