Question

我正在处理一个归因报告，并且正在缓存该数据框，因为该数据框在代码的后面阶段经常使用。使用完成后，我应该unpersist（）还是unpersist（true）。我了解基本区别将分别是异步和同步。但是，一个延迟是否比另一个延迟大？还是有其他含义？

val dfForWeb = loadData(aggregationType, readConfigForWeb).cache()
//some logical code blocks
..
..
..
dfForWeb.unpersist() //This works fine

//Tried using the below and got the same result:

//dfForWeb.unpersist(true) --This also works fine

实际代码如下：

val dfForWeb = loadData(aggregationType, readConfigForWeb).cache()
val dfForMobile = loadData(aggregationType, readConfigForMobile).cache()
if (condition) {
  for (item <- GeoAggregationList) {
    processData(dfForWeb) //This dataframe is used for a lot of computations later
  }
} else {
  processData(dfForWeb) //This dataframe is used for a lot of computations later
}
dfForWeb.unpersist()
dfForMobile.unpersist()

由于要缩放此应用程序以及处理实际数据时，我尝试保持谨慎，我怀疑unpersist（）和unpersist（true）是否会在延迟和数据丢失方面产生巨大差异。请告知。

Unpersist和Unpersist之间的工作差异（阻止：布尔值）

0 个答案: