我正在处理一个归因报告,并且正在缓存该数据框,因为该数据框在代码的后面阶段经常使用。使用完成后,我应该unpersist()还是unpersist(true)。我了解基本区别将分别是异步和同步。但是,一个延迟是否比另一个延迟大?还是有其他含义?
val dfForWeb = loadData(aggregationType, readConfigForWeb).cache()
//some logical code blocks
..
..
..
dfForWeb.unpersist() //This works fine
//Tried using the below and got the same result:
//dfForWeb.unpersist(true) --This also works fine
实际代码如下:
val dfForWeb = loadData(aggregationType, readConfigForWeb).cache()
val dfForMobile = loadData(aggregationType, readConfigForMobile).cache()
if (condition) {
for (item <- GeoAggregationList) {
processData(dfForWeb) //This dataframe is used for a lot of computations later
}
} else {
processData(dfForWeb) //This dataframe is used for a lot of computations later
}
dfForWeb.unpersist()
dfForMobile.unpersist()
由于要缩放此应用程序以及处理实际数据时,我尝试保持谨慎,我怀疑unpersist()和unpersist(true)是否会在延迟和数据丢失方面产生巨大差异。请告知。