我有两个不同类型的值,如下面的spark-sql
所示scala> val ageSum = df.agg(sum("age"))
ageSum: org.apache.spark.sql.DataFrame = [sum(age): bigint]
scala> val totalEntries = df.count();
scala> totalEntries
res37: Long = 45211
第一个值来自数据帧上的聚合函数,第二个值来自数据帧上的总计数函数。两者都有不同的类型,因为ageSum是bigInt,totalEntries是Long。我想对它进行数学运算。 Mean = ageSum / totalEntries
scala> val mean = ageSum/totalEntries
<console>:31: error: value / is not a member of org.apache.spark.sql.DataFrame val mean = ageSum/totalEntries
我也尝试将ageSum转换为long类型但不能这样做
scala> val ageSum = ageSum.longValue
<console>:29: error: recursive value ageSum needs type
val ageSum = ageSum.longValues
答案 0 :(得分:1)
ageSum 是一个数据框,您需要从中提取值。一种选择是使用 first()将值作为 Row 获取,然后从行中提取值:
toDouble
如果您需要更准确的值,可以在分割前使用ageSum.first().getAs[Long](0).toDouble/totalEntries
// res9: Double = 2.5
进行转换:
ageSum.withColumn("mean", $"sum(age)"/totalEntries).show
+--------+----+
|sum(age)|mean|
+--------+----+
| 10| 2.5|
+--------+----+
或者您可以将结果作为ageSum的另一列:
val df = Seq(1,2,3,4).toDF("age")
SELECT * FROM (
SELECT TABLE1.Category,
COUNT(*) as Number
FROM dbo.TABLE1
GROUP BY TABLE1.Category
UNION ALL
SELECT TABLE2.Category,
COUNT(*) as Number
FROM dbo.TABLE2
GROUP BY TABLE2.Category
) as a