如何在spark-sql中的scala中对Long和BigInt执行数学运算

时间:2017-01-26 15:00:44

标签: scala apache-spark apache-spark-sql bigdata

我有两个不同类型的值,如下面的spark-sql

所示
scala> val ageSum = df.agg(sum("age"))
ageSum: org.apache.spark.sql.DataFrame = [sum(age): bigint]
scala> val totalEntries = df.count();
scala> totalEntries
res37: Long = 45211

第一个值来自数据帧上的聚合函数,第二个值来自数据帧上的总计数函数。两者都有不同的类型,因为ageSum是bigInt,totalEntries是Long。我想对它进行数学运算。 Mean = ageSum / totalEntries

scala> val mean = ageSum/totalEntries
<console>:31: error: value / is not a member of org.apache.spark.sql.DataFrame val mean = ageSum/totalEntries

我也尝试将ageSum转换为long类型但不能这样做

scala> val ageSum = ageSum.longValue
<console>:29: error: recursive value ageSum needs type
val ageSum = ageSum.longValues

1 个答案:

答案 0 :(得分:1)

ageSum 是一个数据框,您需要从中提取值。一种选择是使用 first()将值作为 Row 获取,然后从行中提取值:

toDouble

如果您需要更准确的值,可以在分割前使用ageSum.first().getAs[Long](0).toDouble/totalEntries // res9: Double = 2.5 进行转换:

ageSum.withColumn("mean", $"sum(age)"/totalEntries).show
+--------+----+
|sum(age)|mean|
+--------+----+
|      10| 2.5|
+--------+----+

或者您可以将结果作为ageSum的另一列:

val df = Seq(1,2,3,4).toDF("age")
SELECT * FROM (
    SELECT TABLE1.Category,
           COUNT(*) as Number
    FROM dbo.TABLE1
    GROUP BY TABLE1.Category
    UNION ALL
    SELECT TABLE2.Category,
           COUNT(*) as Number
    FROM dbo.TABLE2
    GROUP BY TABLE2.Category
) as a