什么是Spark的内存驱逐策略? FIFO还是LRU?

时间:2016-11-16 02:12:03

标签: apache-spark memory-management

我试图找出Spark的内存驱逐策略,他们说它是LRU(herehere)。

然而,当我查看MemoryStoreBlockManager的源代码时,我找不到LRU的逻辑:

  1. 有一个LinkedHashMap记录了MemoryStore中的所有块

    // Note: all changes to memory allocations, notably putting blocks, evicting blocks, and
    // acquiring or releasing unroll memory, must be synchronized on `memoryManager`!
    private val entries = new LinkedHashMap[BlockId, MemoryEntry[_]](32, 0.75f, true)
    
  2. 访问某个块时,它不会被移动到LinkedHashMap的头部

    def getValues(blockId: BlockId): Option[Iterator[_]] = {
        val entry = entries.synchronized { entries.get(blockId) }
        entry match {
            case null => None
            case e: SerializedMemoryEntry[_] =>
            throw new IllegalArgumentException("should only call getValues on deserialized blocks")
            case DeserializedMemoryEntry(values, _, _) =>
            val x = Some(values)
            x.map(_.iterator)
        }
    }
    
  3. 在逐出块的逻辑中,所选块按LinkedHashMap的entrySet顺序排列, 我认为这是先入先出

    private[spark] def evictBlocksToFreeSpace(
         blockId: Option[BlockId],
         space: Long,
         memoryMode: MemoryMode): Long = {
       assert(space > 0)
       memoryManager.synchronized {
         var freedMemory = 0L
         val rddToAdd = blockId.flatMap(getRddId)
         val selectedBlocks = new ArrayBuffer[BlockId]
         def blockIsEvictable(blockId: BlockId, entry: MemoryEntry[_]): Boolean = {
           entry.memoryMode == memoryMode && (rddToAdd.isEmpty || rddToAdd != getRddId(blockId))
         }
         // This is synchronized to ensure that the set of entries is not changed
         // (because of getValue or getBytes) while traversing the iterator, as that
         // can lead to exceptions.
         entries.synchronized {
           val iterator = entries.entrySet().iterator()
           while (freedMemory < space && iterator.hasNext) {
             val pair = iterator.next()
             val blockId = pair.getKey
             val entry = pair.getValue
             if (blockIsEvictable(blockId, entry)) {
               // We don't want to evict blocks which are currently being read, so we need to obtain
               // an exclusive write lock on blocks which are candidates for eviction. We perform a
               // non-blocking "tryLock" here in order to ignore blocks which are locked for reading:
               if (blockInfoManager.lockForWriting(blockId, blocking = false).isDefined) {
                 selectedBlocks += blockId
                 freedMemory += pair.getValue.size
               }
             }
           }
         }
         ...
         if (freedMemory >= space) {
           logInfo(s"${selectedBlocks.size} blocks selected for dropping " +
             s"(${Utils.bytesToString(freedMemory)} bytes)")
           for (blockId <- selectedBlocks) {
             val entry = entries.synchronized { entries.get(blockId) }
             // This should never be null as only one task should be dropping
             // blocks and removing entries. However the check is still here for
             // future safety.
             if (entry != null) {
               dropBlock(blockId, entry)
             }
           }
          ...
         }
       }
     }
    
  4. 那么,Spark的驱逐策略是FIFO还是LRU?

2 个答案:

答案 0 :(得分:1)

此行中LinkedHashMap的构造函数:

private val entries = new LinkedHashMap[BlockId, MemoryEntry[_]](32, 0.75f, true)

是在LinkedHashMap中创建访问顺序的构造函数:

LinkedHashMap(int initialCapacity, float loadFactor, boolean accessOrder)

accessOrder标志为true,它只是说密钥是根据最近访问过的访问顺序进行排序的。

换句话说,驱逐策略是LRU。这些块根据条目LinkedHashMap中的访问顺序进行排序。选定的驱逐块大约为LinkedHashMap&#39; s entrySet,这意味着要驱逐的第一个区块是最近使用过的区块。

答案 1 :(得分:0)

我之前有同样的问题,但答案非常棘手: 从您在此处粘贴的代码中,没有明确的&#34;促销&#34;操作。 但事实上,&#34; LinkedHashMap&#34;是一种特殊的数据结构,可以确保LRU的顺序。