缓存策略 - LRU算法简析及应用

vlambda
2021-04-22

缓存策略 - LRU算法简析及应用

缓存的概念

原始意义是指访问速度比一般随机存取存储器（RAM）快的一种RAM，通常它不像系统主存那样使用DRAM技术，而使用昂贵但较快速的SRAM技术，如今概念扩展至更广泛的层面，如我们日常接触到的浏览器缓存等，缓存存在的意义是为了提高访问数据的效率，早期主要是指代的硬件层面，如

缓存置换策略

很明显，缓存空间是有限的，所以当缓存空间满了以后，但又有新的内容需要加进来缓存时，如何去处理掉一些旧内容来腾出空间放新的内容，是一个需要我们去考虑的事情，目前的一些解决问题的算法主要是有以下几种：

最久未使用算法（LFU，Least Frequency Used）
先进先出算法（FIFO，First in First out）
最近最少使用算法（LRU，Least Recently used）
非最近使用算法（NMRU）

这些算法在不同层次的缓存上执行时拥有不同的效率和代价，需根据具体场合选择最合适的一种

衡量缓存的指标主要有两个：延迟和命中率。同时也存在其他一些次级因素影响缓存的性能^[1]

内存的平均引用时间为：

其中：

= 内存平均引用时间

= 未命中率 = 1 - (命中率)

= 未命中时访问主内存需要的时间 (或者在多层缓存中对下级缓存的访问时间)

= 延迟，即命中时引用缓存的时间

= 各种次级因素, 如多处理器系统中的队列效应

每种置换策略都是在命中率和置换之间妥协

LRU算法概述

LRU：如果数据最近被访问过，那么将来被访问的几率也更高，即它是根据数据的历史访问记录来执行淘汰数据策略的

我们希望LRU算法整体的表现为查找快、插入快、删除快，且要有顺序，在常用的数据结构中，哈希表查找快，但没有顺序，双向链表插入、删除快且有顺序，但查找慢，因此我们可以结合两者的优点来实现一种新的数据结构哈希链表来实现LRU算法

数据淘汰规则如下：

新数据插入到链表头部；
每当缓存命中（即缓存数据被访问），则将数据移到链表头部；
当链表满的时候，将链表尾部的数据丢弃

大致过程如下：

[2]

在内存空间 T为空时，依次进入 A、 B、 C是可以满足的，此时 A、 B、 C占满了内存空间 T，此时新旧顺序为 C->B->A
当新加入一个 D，因为内存空间 T满了，按照 LRU算法规则，会将 A这个最早进入内存的数据给淘汰，此时新旧顺序为 D->C->B
当新使用了 B， B将作为最新的访问被放在链表的头部，此时 C成了最旧的内容，此时新旧顺序为 B->D->C
当再次往内存空间 T中添加内容 E，此时 T满了，所以需要淘汰最旧的内容来放置 E，因此 C被淘汰，此时新旧顺序为 E->B->D

LRU算法的实现

LRU的实现均参照自LeetCode#146LRU缓存机制^[3]

Map + 双向链表

首先先实现一个简化版的双向链表

// 节点
class LinkNode {
  constructor(key, val) {
    this.key = key
    this.val = val
    this.next = null
    this.prev = null
  }
}

class DoubleLinkedList {
  constructor() {
    // 初始化双向链表的头尾节点
    this.head = new LinkNode(0, 0)
    this.tail = new LinkNode(0, 0)

    this.head.next = this.tail
    this.tail.prev = this.head

    // 链表元素数量
    this.size = 0
  }

  insert(node) {
    node.prev = this.tail.prev
    node.next = this.tail
    this.tail.prev.next = node
    this.tail.prev = node
    ++this.size
  }
  // 删除节点
  remove(node) {
    node.prev.next = node.next
    node.next.prev = node.prev
    --this.size
  }
  // 删除链表的第一个节点，并返回对应节点
  removeFirst() {
    // 如果只有头尾节点，则没有可删除的节点
    if (this.head.next === this.tail) {
      return null
    }
    const firstNode = this.head.next
    this.remove(firstNode)
    return firstNode
  }
  getSize() {
    return this.size
  }
}

class LRUCache {
  // 以正整数作为容量 capacity 初始化 LRU 缓存
  constructor(capacity = 0) {
    this.hash = new Map()
    this.cache = new DoubleLinkedList()
    this.capacity = capacity
  }

  get(key) {
    if (!this.hash.has(key)) return -1
    // 此段逻辑目的是为了将该数据提升至最新访问
    const node = this.hash.get(key)
    // 先将该节点从缓存中删除
    this.cache.remove(node)
    // 再将其插入队尾
    this.cache.insert(node)

    return node.val
  }

  put(key, value) {
    if (this.hash.has(key)) {
      const node = this.hash.get(key)
      // 如果是已存在的数据，则先删除
      this.hash.delete(key)
      this.cache.remove(node)
    } else {
      const cacheSize = this.cache.getSize()
      // 当缓存容量已满
      if (cacheSize >= this.capacity) {
        // 则需要删除链表的第一个节点，此时该节点为最旧的访问数据
        const removeNode = this.cache.removeFirst()
        // 删除哈希表中对应的key
        this.hash.delete(removeNode.key)
      }
    }
    // 插入哈希表和链表
    const nNode = new LinkNode(key, value)
    this.hash.set(key, nNode)
    this.cache.insert(nNode)
  }
}

const lRUCache = new LRUCache(2)

lRUCache.put(1, 1); // 缓存是 {1=1}
lRUCache.put(2, 2); // 缓存是 {1=1, 2=2}
console.log('lRUCache.get(1)---', lRUCache.get(1));    // 返回 1
lRUCache.put(3, 3); // 该操作会使得关键字 2 作废，缓存是 {1=1, 3=3}
console.log('lRUCache.get(2)---', lRUCache.get(2));    // 返回 -1
lRUCache.put(4, 4); // 该操作会使得关键字 1 作废，缓存是 {4=4, 3=3}
console.log('lRUCache.get(1)---', lRUCache.get(1));    // 返回 -1
console.log('lRUCache.get(3)---', lRUCache.get(3));    // 返回 3
console.log('lRUCache.get(4)---', lRUCache.get(4));    // 返回 4

Map + Set

class LRUCache {
  // 以正整数作为容量 capacity 初始化 LRU 缓存
  constructor(capacity = 0) {
    this.hash = new Map()
    this.cache = new Set()
    this.capacity = capacity
  }
  // 插入key和数据
  _insert(key, val) {
    this.hash.set(key, val)
    this.cache.add(key)
  }
  // 删除key
  _remove(key) {
    this.hash.delete(key)
    this.cache.delete(key)
  }

  get(key) {
    if(!this.hash.has(key)) return -1
    // 此段逻辑目的是为了将该数据提升至最新访问
    // 先将该节点从缓存中删除
    this.cache.delete(key)
    // 再将其插入队尾
    this.cache.add(key)

    return this.hash.get(key)
  }

  put(key, value) {
    if(this.hash.has(key)) {
      // 如果是已存在的数据，则先删除
      this._remove(key)
    } else {
      const cacheSize = this.cache.size
      // 当缓存容量已满
      if(cacheSize >= this.capacity) {
        // 获取到Set对象内按照插入顺序返回的最旧一个数据
        // https://developer.mozilla.org/zh-CN/docs/Web/JavaScript/Reference/Global_Objects/Set/values
        const removeKey = this.cache.values().next().value
        // 删除对应的key
        this._remove(removeKey)
        this.hash.delete(removeKey) 
        this.cache.delete(removeKey)
      }
    }
    
    // 插入对应的key和val
    this._insert(key, value)
  }
}

const lRUCache = new LRUCache(2)

lRUCache.put(1, 1); // 缓存是 {1=1}
lRUCache.put(2, 2); // 缓存是 {1=1, 2=2}
console.log('lRUCache.get(1)---', lRUCache.get(1));    // 返回 1
lRUCache.put(3, 3); // 该操作会使得关键字 2 作废，缓存是 {1=1, 3=3}
console.log('lRUCache.get(2)---', lRUCache.get(2));    // 返回 -1
lRUCache.put(4, 4); // 该操作会使得关键字 1 作废，缓存是 {4=4, 3=3}
console.log('lRUCache.get(1)---', lRUCache.get(1));    // 返回 -1
console.log('lRUCache.get(3)---', lRUCache.get(3));    // 返回 3
console.log('lRUCache.get(4)---', lRUCache.get(4));    // 返回 4

LRU算法在前端的应用

`Vue -> keep-alive`

keep-alive在Vue中作为内置组件，主要目的是为了实现组件的缓存，当组件切换时缓存组件实例，而不是对组件进行销毁，该组件有一个属性max（v2.5.0+），表示最多可以缓存多少组件实例。一旦这个数字达到了，在新实例被创建之前，已缓存组件中最久没有被访问的实例会被销毁掉，该逻辑就用到了LRU算法

源码解析：

// v2.6.11
/* @flow */

// ...

function pruneCacheEntry (
  cache: VNodeCache,
  key: string,
  keys: Array<string>,
  current?: VNode
) {
  const cached = cache[key]
  if (cached && (!current || cached.tag !== current.tag)) {
    cached.componentInstance.$destroy()
  }
  // 清除key的缓存
  cache[key] = null
  remove(keys, key)
}

export default {
  // ...

  render () {
    const slot = this.$slots.default
    const vnode: VNode = getFirstComponentChild(slot)
    const componentOptions: ?VNodeComponentOptions = vnode && vnode.componentOptions
    if (componentOptions) {
      // ...

      /**
       * 从这里开始是LRU算法逻辑
       * 
       * 如果命中缓存，则从缓存中获取 vnode 的组件实例，并且调整 key 的顺序放入 keys 数组的末尾
       */
      const { cache, keys } = this
      const key: ?string = vnode.key == null
        ? componentOptions.Ctor.cid + (componentOptions.tag ? `::${componentOptions.tag}` : '')
        : vnode.key
      // 判断如果在缓存中存在当前key
      if (cache[key]) {
        // 把缓存内的组件实例作为当前组件实例
        vnode.componentInstance = cache[key].componentInstance
        // 则先删除keys中的key，再将其加入到keys的最后面，作为最近一次访问的key
        remove(keys, key)
        keys.push(key)
      } else {
        // 如果不在当前缓存中
        cache[key] = vnode
        // 则将当前key加入到keys
        keys.push(key)
        // 如果发现缓存keys的数量大于最大可缓存数量，则执行清除最旧访问的逻辑
        if (this.max && keys.length > parseInt(this.max)) {
          pruneCacheEntry(cache, keys[0], keys, this._vnode)
        }
      }
      // ...
    }
    return vnode || (slot && slot[0])
  }
}

vue3中的实现大体一致，只是其中的cache由Object.create(null)换成了new Map()，keys由[]换成了new Set()

`React -> react-cache`

react-cache^[4]作为一个基础包存在在react项目中，其中就用到了LRU，以下为源码，可以看到使用的是双向链表来实现

// https://github.com/facebook/react/blob/master/packages/react-cache/src/LRU.js

/**
 * Copyright (c) Facebook, Inc. and its affiliates.
 *
 * This source code is licensed under the MIT license found in the
 * LICENSE file in the root directory of this source tree.
 *
 * @flow
 */

import * as Scheduler from 'scheduler';

// Intentionally not named imports because Rollup would
// use dynamic dispatch for CommonJS interop named imports.
const {
  unstable_scheduleCallback: scheduleCallback,
  unstable_IdlePriority: IdlePriority,
} = Scheduler;

type Entry<T> = {|
  value: T,
  onDelete: () => mixed,
  previous: Entry<T>,
  next: Entry<T>,
|};

export function createLRU<T>(limit: number) {
  let LIMIT = limit;

  // Circular, doubly-linked list
  let first: Entry<T> | null = null;
  let size: number = 0;

  let cleanUpIsScheduled: boolean = false;

  function scheduleCleanUp() {
    if (cleanUpIsScheduled === false && size > LIMIT) {
      // The cache size exceeds the limit. Schedule a callback to delete the
      // least recently used entries.
      cleanUpIsScheduled = true;
      scheduleCallback(IdlePriority, cleanUp);
    }
  }

  function cleanUp() {
    cleanUpIsScheduled = false;
    deleteLeastRecentlyUsedEntries(LIMIT);
  }

  function deleteLeastRecentlyUsedEntries(targetSize: number) {
    // Delete entries from the cache, starting from the end of the list.
    if (first !== null) {
      const resolvedFirst: Entry<T> = (first: any);
      let last = resolvedFirst.previous;
      while (size > targetSize && last !== null) {
        const onDelete = last.onDelete;
        const previous = last.previous;
        last.onDelete = (null: any);

        // Remove from the list
        last.previous = last.next = (null: any);
        if (last === first) {
          // Reached the head of the list.
          first = last = null;
        } else {
          (first: any).previous = previous;
          previous.next = (first: any);
          last = previous;
        }

        size -= 1;

        // Call the destroy method after removing the entry from the list. If it
        // throws, the rest of cache will not be deleted, but it will be in a
        // valid state.
        onDelete();
      }
    }
  }

  function add(value: Object, onDelete: () => mixed): Entry<Object> {
    const entry = {
      value,
      onDelete,
      next: (null: any),
      previous: (null: any),
    };
    if (first === null) {
      entry.previous = entry.next = entry;
      first = entry;
    } else {
      // Append to head
      const last = first.previous;
      last.next = entry;
      entry.previous = last;

      first.previous = entry;
      entry.next = first;

      first = entry;
    }
    size += 1;
    return entry;
  }

  function update(entry: Entry<T>, newValue: T): void {
    entry.value = newValue;
  }

  function access(entry: Entry<T>): T {
    const next = entry.next;
    if (next !== null) {
      // Entry already cached
      const resolvedFirst: Entry<T> = (first: any);
      if (first !== entry) {
        // Remove from current position
        const previous = entry.previous;
        previous.next = next;
        next.previous = previous;

        // Append to head
        const last = resolvedFirst.previous;
        last.next = entry;
        entry.previous = last;

        resolvedFirst.previous = entry;
        entry.next = resolvedFirst;

        first = entry;
      }
    } else {
      // Cannot access a deleted entry
      // TODO: Error? Warning?
    }
    scheduleCleanUp();
    return entry.value;
  }

  function setLimit(newLimit: number) {
    LIMIT = newLimit;
    scheduleCleanUp();
  }

  return {
    add,
    update,
    access,
    setLimit,
  };
}

参考资料

[1]

缓存文件置换机制: https://zh.wikipedia.org/wiki/%E5%BF%AB%E5%8F%96%E6%96%87%E4%BB%B6%E7%BD%AE%E6%8F%9B%E6%A9%9F%E5%88%B6

[2]

LRU 缓存淘汰算法原理与应用: https://leecason.github.io/articles/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84%E4%B8%8E%E7%AE%97%E6%B3%95/LRU%20%E7%BC%93%E5%AD%98%E6%B7%98%E6%B1%B0%E7%AE%97%E6%B3%95%E5%8E%9F%E7%90%86%E4%B8%8E%E5%BA%94%E7%94%A8.html

[3]

LeetCode#146LRU缓存机制: https://leetcode-cn.com/problems/lru-cache/

[4]

react-cache: https://github.com/facebook/react/blob/master/packages/react-cache/src/LRU.js

vlambda博客
学习文章列表