我的代码应该从Map
中提取dataframe
。该地图将在以后用于一些计算(将贷项映射到最匹配的原始帐单)。但是,第一步已经失败了-TransactionId
始终被检索为0。
代码的简化版本:
case class SalesTransaction(
CustomerId : Int,
Score : Int,
Revenue : Double,
Type : String,
Credited : Double = 0.0,
LinkedTransactionId : Int = 0,
IsProcessed : Boolean = false
)
val df = Seq(
(1, 1, 123, "Sales", 100),
(1, 2, 122, "Credit", 100),
(1, 3, 99, "Sales", 70),
(1, 4, 101, "Sales", 77),
(1, 5, 102, "Credit", 75),
(1, 6, 98, "Sales", 71),
(2, 7, 200, "Sales", 55),
(2, 8, 220, "Sales", 55),
(2, 9, 200, "Credit", 50),
(2, 10, 205, "Sales", 50)
).toDF("CustomerId", "TransactionId", "TransactionAttributesScore", "TransactionType", "Revenue")
.withColumn("Revenue", $"Revenue".cast(DoubleType))
.repartition($"CustomerId")
//map generation:
val m2 : Map[Int, SalesTransaction] =
df.map(row => (
row.getAs("TransactionId")
, new SalesTransaction(row.getAs("CustomerId")
, row.getAs("TransactionAttributesScore")
, row.getAs("Revenue")
, row.getAs("TransactionType")
)
)
).collect.toMap
m2.foreach(m => println("key: " + m._1 +" Value: "+ m._2))
输出只有最后一条记录,因为row.getAs("TransactionId")
捕获的所有值均为空(即在m2 Map中转换为0),因此每次迭代中创建的元组为(null, [current row SalesTransaction])
。
您能告诉我我的代码有什么问题吗?我对Scala还是很陌生,在这里一定要错过一些语法上的细微差别。
答案 0 :(得分:1)
您也可以使用row.getAs[Int]("TransactionId")
,如下所示:
val m2 : Map[Int, SalesTransaction] =
df.map(row => (
row.getAs[Int]("TransactionId"),
new SalesTransaction(row.getAs("CustomerId"),
row.getAs("TransactionAttributesScore"),
row.getAs("Revenue"),
row.getAs("TransactionType"))
)
).collect.toMap
始终最好使用getAs的强制转换版本,以避免此类错误。
答案 1 :(得分:0)
问题与从row.getAs("TransactionId")
获得的数据类型有关。尽管基础$"TransactionId"
是整数。转换输入可以明确解决该问题:
//… code above unchanged
val m2 : Map[Int, SlTransaction] =
df.map(row => {
val mKey : Int = row.getAs("TransactionId") //forcing into Int variable
val mValue : SlTransaction = new SlTransaction(row.getAs("CustomerId")
, row.getAs("TransactionAttributesScore")
, row.getAs("Revenue")
, row.getAs("TransactionType")
)
(mKey, mValue)
}
).collect.toMap