我尝试将transactionCount变量设为100,然后得到0.我有一个RDD,总是只有一个分区。我有一段像这样处理RDD的代码:
var transactionCount = -1
payment_rdd.foreachPartition { partitionOfRecords =>
// this line affect 100 to transactionCount since the I have 100 record
// in my RDD so in my partition
transactionCount = partitionOfRecords.size
partitionOfRecords.foreach { record =>
//I procces each record
}
try {
// this line keep 100 to transactionCount
//another process
}
catch {
case _: Throwable => {
// I never pass here
log.error("Cannot process correctly")
transactionCount = 0
}
}
}
return transactionCount
我得到的回报-1尽管有100,但我无法理解为什么。 你有什么想法或更好的解决方案吗?感谢
答案 0 :(得分:2)
执行此代码时:
你也做不到这个:
transactionCount = partitionOfRecords.size
Iterators
只能遍历一次,并且在计算大小后将为空。
我使用Try
和累加器:
val transactionCount = spark.sparkContext.longAccumulator
rdd.foreach { record => {
if Try {
// your code goes here
}.isSuccess transactionCount.add(1L)
}}