我有一个ExpenseEntry类型的数据集。 ExpenseEntry是基本的dat结构,用于跟踪在每个amount
上花费的category
case class ExpenseEntry(
name: String,
category: String,
amount: BigDecimal
)
示例值-
ExpenseEntry("John", "candy", 0.5)
ExpenseEntry("Tia", "game", 0.25)
ExpenseEntry("John", "candy", 0.15)
ExpenseEntry("Tia", "candy", 0.55)
预期答案是
category - name - amount
candy - John - 0.65
candy - Tia - 0.55
game - Tia - 0.25
我想做的是,获取每个原因每个名字花费的总金额。所以,我有下面的数据集查询
dataset.groupBy("category", "name").agg(sum("amount"))
从理论上讲,该查询对我来说似乎是正确的。但是,总和显示为0E-18
,其为0。我猜想该金额正被int
函数内部转换为sum
。如何将其投放到BigInt?我对这个问题的理解正确吗?
答案 0 :(得分:1)
package spark
import org.apache.spark.sql.{DataFrame, SparkSession}
object SumBig extends App{
val spark = SparkSession.builder()
.master("local")
.appName("DataFrame-example")
.getOrCreate()
import spark.implicits._
case class ExpenseEntry(
name: String,
category: String,
amount: BigDecimal
)
val df = Seq(
ExpenseEntry("John", "candy", 0.5),
ExpenseEntry("Tia", "game", 0.25),
ExpenseEntry("John", "candy", 0.15),
ExpenseEntry("Tia", "candy", 0.55)
).toDF()
df.show(false)
val r = df.groupBy("category", "name").sum("amount")
r.show(false)
// +--------+----+--------------------+
// |category|name|sum(amount) |
// +--------+----+--------------------+
// |game |Tia |0.250000000000000000|
// |candy |John|0.650000000000000000|
// |candy |Tia |0.550000000000000000|
// +--------+----+--------------------+
}
答案 1 :(得分:1)
df.groupBy("category", "name").agg( sum(bround( col("amount"),2) ).as("sum_amount")).show()