Question

好吧我对这个问题的最佳名称有点不确定:)但是假设这个场景，你就是出去拿一些网页（有各种网址）并在本地缓存。即使使用多个线程，缓存部分也很容易解决。

但是，假设一个线程开始获取一个url，几毫秒之后另一个线程想要获取相同的url。是否有任何好的模式使秒线程的方法等待第一个获取页面，将其插入缓存并返回它，这样您就不必执行多个请求。只需很少的开销即使对于大约需要300-700毫秒的请求也是值得做的？并且没有锁定对其他网址的请求

基本上，当相同网址的请求紧跟在一起后，我希望第二个请求“捎带”第一个请求

当你开始抓取一个页面并锁定它时，我有一个简单的想法，你可以用一个字典插入一个带有键的对象作为url。如果有任何匹配的密钥已经得到了对象，锁定它，然后尝试获取实际缓存的URL。

我有点不确定细节但是要使它真的是线程安全的，使用ConcurrentDictionary可能是它的一部分......

对于这样的场景，是否有任何常见的模式和解决方案？

细分错误行为：

线程1：检查缓存，它不存在，因此开始获取URL

线程2：开始获取相同的URL，因为它仍然不存在于Cache

中

线程1：完成并插入缓存，返回页面

线程2：完成并插入缓存（或丢弃它），返回页面

细分正确行为：

线程1：检查缓存，它不存在，因此开始获取URL

线程2：想要相同的网址，但看到它正在被提取，所以在线程1上等待

线程1：完成并插入缓存，返回页面

线程2：注意线程1已完成并返回它所获取的页面线程

修改的

大多数解决方案似乎误解了问题并且只解决了缓存，正如我所说的不是问题，问题是当进行外部网络提取以使第二次提取在第一次提取之前完成缓存以使用第一个结果而不是第二个结果

Answer 1

编辑：我的代码现在非常丑陋，但每个网址使用一个单独的锁。这允许异步提取不同的URL，但每个URL只能被提取一次。

public class UrlFetcher
{
    static Hashtable cache = Hashtable.Synchronized(new Hashtable());

    public static String GetCachedUrl(String url)
    {
        // exactly 1 fetcher is created per URL
        InternalFetcher fetcher = (InternalFetcher)cache[url];
        if( fetcher == null )
        {
            lock( cache.SyncRoot )
            {
                fetcher = (InternalFetcher)cache[url];
                if( fetcher == null )
                {
                    fetcher = new InternalFetcher(url);
                    cache[url] = fetcher;
                }
            }
        }
        // blocks all threads requesting the same URL
        return fetcher.Contents;
    }

    /// <summary>Each fetcher locks on itself and is initilized with null contents.
    /// The first thread to call fetcher.Contents will cause the fetch to occur, and
    /// block until completion.</summary>
    private class InternalFetcher
    {
        private String url;
        private String contents;

        public InternalFetcher(String url)
        {
            this.url = url;
            this.contents = null;
        }

        public String Contents
        {
            get
            {
                if( contents == null )
                {
                    lock( this ) // "this" is an instance of InternalFetcher...
                    {
                        if( contents == null )
                        {
                            contents = FetchFromWeb(url);
                        }
                    }
                }
                return contents;
            }
        }
    }
}

Answer 2

您可以使用ConcurrentDictionary<K,V>和double-checked locking的变体：

public static string GetUrlContent(string url)
{
    object value1 = _cache.GetOrAdd(url, new object());

    if (value1 == null)    // null check only required if content
        return null;       // could legitimately be a null string

    var urlContent = value1 as string;
    if (urlContent != null)
        return urlContent;    // got the content

    // value1 isn't a string which means that it's an object to lock against
    lock (value1)
    {
        object value2 = _cache[url];

        // at this point value2 will *either* be the url content
        // *or* the object that we already hold a lock against
        if (value2 != value1)
            return (string)value2;    // got the content

        urlContent = FetchContentFromTheWeb(url);    // todo
        _cache[url] = urlContent;
        return urlContent;
    }
}

private static readonly ConcurrentDictionary<string, object> _cache =
                                  new ConcurrentDictionary<string, object>();

Answer 3

请Semaphore站起来！站起来！站起来！

使用Semaphore您可以轻松地将线程与其同步。

的两种情况

您正在尝试加载当前正在缓存的页面
您正在将缓存保存到从中加载页面的文件中。

在这两种情况下，你都会遇到麻烦。

就像作家和读者问题一样，这是操作系统赛车问题中的常见问题。就在线程想要重建缓存或开始缓存页面时，没有线程应该从中读取。如果一个线程正在读取它，它应该等到读完并替换缓存，没有2个线程应该将同一页面缓存到同一个文件中。因此，所有读者都可以随时从缓存中读取，因为没有作者在上面写作。

你应该在msdn上使用样本阅读一些信号量，它非常容易使用。只是想要做某事的线程是调用信号量，如果资源可以被授予它，那么工作就会睡觉并等待资源准备就绪时被唤醒。

Answer 4

免责声明：这可能是一个难以回答的答案。请原谅我，如果是的话。

我建议使用一些带锁的共享字典对象来跟踪当前获取或已经获取的url。

在每次请求时，请检查针对此对象的网址。
如果存在url的条目，请检查缓存。（这意味着其中一个线程已经获取或正在获取它）
如果它在缓存中可用，请使用它，否则将当前线程暂停一段时间并再次检查。（如果不是在缓存中，某些线程仍在获取它，所以请等待它完成）
如果在词典对象中找不到该条目，请将URL添加到该对象并发送请求。获得响应后，将其添加到缓存中。

此逻辑应该有效，但是，您需要处理缓存过期以及从字典对象中删除条目。

Answer 5

这不仅适用于并发缓存，而是适用于所有缓存：

"A cache with a bad policy is another name for a memory leak"（Raymond Chen）

Answer 6

我的解决方案是在缓存超时或不存在时使用 atomicBoolean 来控制访问数据库；

在同一时刻，只有一个线程（我称之为 read-th）可以访问数据库，其他线程一直旋转直到 read-th 返回数据并将其写入缓存；

这里是代码；用java实现；

public class CacheBreakDownDefender<K, R> {

/**
 * false = do not write null to cache when get null value from database;
 */
private final boolean writeNullToCache;

/**
 * cache different query key
 */
private final ConcurrentHashMap<K, AtomicBoolean> selectingDBTagMap = new ConcurrentHashMap<>();


public static <K, R> CacheBreakDownDefender<K, R> getInstance(Class<K> keyType, Class<R> resultType) {
    return Singleton.get(keyType.getName() + resultType.getName(), () -> new CacheBreakDownDefender<>(false));
}

public static <K, R> CacheBreakDownDefender<K, R> getInstance(Class<K> keyType, Class<R> resultType, boolean writeNullToCache) {
    return Singleton.get(keyType.getName() + resultType.getName(), () -> new CacheBreakDownDefender<>(writeNullToCache));
}

private CacheBreakDownDefender(boolean writeNullToCache) {
    this.writeNullToCache = writeNullToCache;
}

public R readFromCache(K key, Function<K, ? extends R> getFromCache, Function<K, ? extends R> getFromDB, BiConsumer<K, R> writeCache) throws InterruptedException {
    R result = getFromCache.apply(key);
    if (result == null) {
        final AtomicBoolean selectingDB = selectingDBTagMap.computeIfAbsent(key, x -> new AtomicBoolean(false));
        if (selectingDB.compareAndSet(false, true)) { 
            try { 
                result = getFromDB.apply(key);
                if (result != null || writeNullToCache) {
                    writeCache.accept(key, result);
                }
            } finally {
                selectingDB.getAndSet(false);
                selectingDBTagMap.remove(key);
            }
        } else {
            
            while (selectingDB.get()) {
                TimeUnit.MILLISECONDS.sleep(0L);
                //do nothing...  
            }
            return getFromCache.apply(key);
        }
    }
    return result;
}

public static void main(String[] args) throws InterruptedException {

    Map<String, String> map = new ConcurrentHashMap<>();
    CacheBreakDownDefender<String, String> instance = CacheBreakDownDefender.getInstance(String.class, String.class, true);

    for (int i = 0; i < 9; i++) {
        int finalI = i;
        new Thread(() -> {
            String kele = null;
            try {
                if (finalI == 6) {
                    kele = instance.readFromCache("kele2", map::get, key -> "helloword2", map::put);
                } else
                    kele = instance.readFromCache("kele", map::get, key -> "helloword", map::put);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
            log.info("resut= {}", kele);
        }).start();
    }
    TimeUnit.SECONDS.sleep(2L);
}

}

并发缓存共享的模式

6 个答案: