vlambda博客
学习文章列表

函数式编程-记忆化缓存

记忆化,是一种为了提高应用程序性能的FP技术。程序加速是通过缓存函数的结果实现的,避免了重复计算带来的额外开销。

1、现在我们使用Dictionary作为缓存结构

public static Func<T, R> Memoize<T, R>(Func<T, R> func)  where T : IComparable{ Dictionary<T, R> cache = new Dictionary<T, R>(); return arg => { if (cache.ContainsKey(arg)) return cache[arg]; return (cache[arg] = func(arg)); };}
public static string GetString(string name){ return $"return date {DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss")} string {name}";}
var getStrMemoize = Memoize<stringstring>(GetString);Console.WriteLine(getStrMemoize("A"));Thread.Sleep(3000);Console.WriteLine(getStrMemoize("B"));Thread.Sleep(3000);Console.WriteLine(getStrMemoize("A"));

打印结果:

return date 2020-12-31 08:37:12 string Areturn date 2020-12-31 08:37:15 string Breturn date 2020-12-31 08:37:12 string A

可以看出第三次打印的结果跟第一次打印的结果相同,也就是被缓存在Dictionary中的值。

在单线程中我们这样写没有问题,程序顺序被执行,Dictionary不存在并发问题,但是当我们想在多个线程并行时Dictionary不是线程安全集合,会存在线程安全问题。

2、现在我们使用线程安全集合ConcurrentDictionary进行改进:(方法中注释已经对方法做了说明,在此不重复)

/// <summary>/// 使用线程安全集合/// </summary>/// <typeparam name="T"></typeparam>/// <typeparam name="R"></typeparam>/// <param name="func"></param>/// 对于字典的修改和写入操作, ConcurrentDictionary<TKey,TValue> 使用细粒度锁定以确保线程安全。/// 对字典进行 (读取操作时,将以无锁方式执行。) 不过,在 valueFactory 锁的外部调用委托,以避免在锁定下执行未知代码时可能产生的问题。/// 因此,对于 GetOrAdd 类上的所有其他操作而言,不是原子的 ConcurrentDictionary<TKey,TValue>/// 由于在生成值时,另一个线程可以插入键/值 valueFactory ,因此您不能信任这一点,/// 因为已 valueFactory 执行,其生成的值将插入到字典中并返回。/// 如果 GetOrAdd 在不同的线程上同时调用,则 valueFactory 可以多次调用,但只会将一个键/值对添加到字典中。/// 返回值取决于字典中的键是否存在,以及是否在 GetOrAdd 调用之后但在生成值之前由另一个线程插入了键/值 valueFactory/// (如果当前线程检查到Key不在字典中,那么会执行生成键值;但是在写入前如果有线程完成了写入键值,当前线程写入前检查到有写入值,则以已写入的为准)。/// <returns></returns>public static Func<T, R> MemoizeThreadSafe<T, R>(Func<T, R> func) where T : IComparable{ ConcurrentDictionary<T, R> cache = new ConcurrentDictionary<T, R>(); return arg => { return cache.GetOrAdd(arg, a => func(arg)); };}
var getStrMemoize = MemoizeThreadSafe<stringstring>(GetString);Console.WriteLine(getStrMemoize("A"));Thread.Sleep(3000);Console.WriteLine(getStrMemoize("B"));Thread.Sleep(3000);Console.WriteLine(getStrMemoize("A"));

打印结果:

return date 2020-12-31 08:42:46 string Areturn date 2020-12-31 08:42:49 string Breturn date 2020-12-31 08:42:46 string A

注解中我们说明了ConcurrentDictionary是线程安全集合,但是当我们使用GetOrAdd时,由于该方法不是原子性的操作,当进行初始化时,可能多个线程同时进行初始化操作,带来了额外的开销。

3、为解决GetOrAdd非原子性操作重复初始化操作,引入延迟初始化(注解已详细说明):

在看改进方法前我们先看下Lazy类的用法:

public class user{ public string name { get; set; }}
Lazy<user> user = new Lazy<user>();if (!user.IsValueCreated) Console.WriteLine("user 未创建.");user.Value.name = "test";if (user.IsValueCreated) Console.WriteLine("user 已创建."); 

输出:

user 未创建.user 已创建.

以下为Lazy类代码片段,从代码我们看出在对象未使用(value)前,实例并未真正创建:

[NonSerialized]private Func<T> m_valueFactory;
private object m_boxed;
public T Value{ get {        return LazyInitValue(); }}private T LazyInitValue(){    Boxed boxed = null; try { boxed = CreateValue(); m_boxed = boxed; } finally { } return boxed.m_value;}
private Boxed CreateValue(){ Boxed boxed = null; if (m_valueFactory != null) //() => func(arg) { try { Func<T> factory = m_valueFactory;
boxed = new Boxed(factory()); } catch (Exception ex) { throw; } }

return boxed;}
[Serializable]class Boxed{ internal Boxed(T value) { m_value = value; } internal T m_value;}

现在我们看下改进方法:

/// <summary>/// 为解决GetOrAdd 非原子性操作,/// 重复初始化操作,引入Lazy类型、/// 延迟初始化/// </summary>/// <typeparam name="T"></typeparam>/// <typeparam name="R"></typeparam>/// <param name="func"></param>/// 使用延迟初始化来延迟创建大型或消耗大量资源的对象,或者执行大量占用资源的任务/// ,尤其是在程序的生存期内可能不会发生这种创建或执行时。/// 若要为迟缓初始化做好准备,请创建的实例 Lazy<T>/// 你创建的对象的类型参数 Lazy<T> 指定你希望延迟初始化的对象的类型。/// 用于创建对象的构造函数 Lazy<T> 确定初始化的特征。/// 首次访问 Lazy<T>.Value 属性时出现延迟初始化。/// <returns></returns>public static Func<T, R> MemoizeLazyThreadSafe<T, R>(Func<T, R> func) where T : IComparable{ ConcurrentDictionary<T, Lazy<R>> cache = new ConcurrentDictionary<T, Lazy<R>>(); return arg => { return cache.GetOrAdd(arg, a => new Lazy<R>(() => func(arg))).Value; };}

到现在方法的线程安全、初始化加载问题都解决了,但是我们在解决重复计算的问题后却又不得不考虑缓存带来的内存损耗问题。我们实例化了ConcurrentDictionary对象,并且该对象作为强引用类型一直未被释放,那么GC是无法回收该对象,带来的问题是内存一直被占用,随着方法引用次数越来越多内存开销则会越来越大。

4、为解决该问题,我们引入过期时间,根据过期时间释放缓存值。

public static Func<T, R> MemoizeWeakWithTtl<T, R>(Func<T, R> func, TimeSpan ttl) where T : class, IEquatable<T> where R : class{ var keyStore = new ConcurrentDictionary<int, T>();
T ReduceKey(T obj) { var oldObj = keyStore.GetOrAdd(obj.GetHashCode(), obj); return obj.Equals(oldObj) ? oldObj : obj; }
var cache = new ConditionalWeakTable<T, Tuple<R, DateTime>>();
Tuple<R, DateTime> FactoryFunc(T key) => new Tuple<R, DateTime>(func(key), DateTime.Now + ttl);
return arg => { var key = ReduceKey(arg); var value = cache.GetValue(key, FactoryFunc); if (value.Item2 >= DateTime.Now) return value.Item1; value = FactoryFunc(key); cache.Remove(key); cache.Add(key, value); return value.Item1; };}

其他实现方式,使用WeakReference弱引用类型(以下为使用示例):

public class Cache{ static Dictionary<int, WeakReference> _cache;
int regenCount = 0;
public Cache(int count) { _cache = new Dictionary<int, WeakReference>();
for (int i = 0; i < count; i++) { _cache.Add(i, new WeakReference(new Data(i), false)); } }
public int Count { get { return _cache.Count; } }
public int RegenerationCount { get { return regenCount; } }
public Data this[int index] { get { Data d = _cache[index].Target as Data; if (d == null) { Console.WriteLine("Regenerate object at {0}: Yes", index); d = new Data(index); _cache[index].Target = d; regenCount++; } else { Console.WriteLine("Regenerate object at {0}: No", index); }
return d; } }}

public class Data{ private byte[] _data; private string _name;
public Data(int size) { _data = new byte[size * 1024]; _name = size.ToString(); }
// Simple property. public string Name { get { return _name; } }}
 int cacheSize = 50;Random r = new Random();Cache c = new Cache(cacheSize);
string DataName = "";GC.Collect(0);
for (int i = 0; i < c.Count; i++){ int index = r.Next(c.Count); DataName = c[index].Name;}double regenPercent = c.RegenerationCount / (double)c.Count;Console.WriteLine("Cache size: {0}, Regenerated: {1:P2}%", c.Count, regenPercent);

打印结果:

Regenerate object at 46: YesRegenerate object at 5: YesRegenerate object at 6: YesRegenerate object at 31: YesRegenerate object at 1: YesRegenerate object at 33: YesRegenerate object at 11: YesRegenerate object at 5: NoRegenerate object at 37: YesRegenerate object at 15: YesRegenerate object at 25: YesRegenerate object at 14: NoRegenerate object at 16: YesRegenerate object at 20: YesRegenerate object at 10: YesRegenerate object at 14: NoRegenerate object at 17: YesRegenerate object at 28: YesRegenerate object at 7: YesRegenerate object at 34: YesRegenerate object at 45: YesRegenerate object at 33: NoRegenerate object at 29: YesRegenerate object at 32: YesRegenerate object at 32: NoRegenerate object at 4: NoRegenerate object at 42: YesRegenerate object at 6: NoRegenerate object at 16: NoRegenerate object at 36: YesRegenerate object at 12: YesRegenerate object at 9: YesRegenerate object at 43: YesRegenerate object at 12: NoRegenerate object at 49: YesRegenerate object at 37: NoRegenerate object at 36: NoRegenerate object at 44: YesRegenerate object at 22: YesRegenerate object at 31: NoRegenerate object at 1: NoRegenerate object at 24: NoRegenerate object at 23: YesRegenerate object at 38: YesRegenerate object at 6: NoRegenerate object at 31: NoRegenerate object at 28: NoCache size: 50, Regenerated: 66.00%%

具体实现方式不在此实现。