带有冲突/排序/互斥的F#异步操作

时间:2018-11-01 09:47:11

标签: f# async-await mutex

F#使使用async构建器轻松定义异步计算成为可能。您可以编写整个程序,然后将其传递给Async.RunSynchronously

我遇到的问题是某些async动作不能同时运行。他们应被迫等待其他async动作完成。这就像一个互斥锁。但是,我不想只是串行链接它们,因为这样效率低下。

具体示例:下载缓存

假设我想使用本地文件缓存来获取一些远程文件。在我的应用程序中,我在很多地方都调用fetchFile : Async<string>,但是如果我同时在同一URL上调用fetchFile,则存在风险,因为多次写入会损坏高速缓存。相反,fetchFile命令应具有以下行为:

  • 如果没有缓存,请将文件下载到缓存中,然后读取缓存内容
  • 如果当前正在写入缓存,请等待写入完成,然后读取内容
  • 如果缓存存在且完整,只需读取缓存内容
  • 在两个不同的URL上的
  • fetchFile应该并行工作

我正在想象某种有状态的DownloadManager类,可以在内部将请求发送到该类并对其进行排序。

F#程序员通常如何使用async实现这种逻辑?


虚构用法:

let dm = new DownloadManager()

let urls = [
  "https://www.google.com"; 
  "https://www.google.com"; 
  "https://www.wikipedia.org"; 
  "https://www.google.com"; 
  "https://www.bing.com"; 
]

let results = 
  urls
  |> Seq.map dm.Download
  |> Async.Parallel
  |> Async.RunSynchronously

注意:之前我曾问过this question如何以半平行的方式运行async动作,但是现在我意识到这种方法很难组合。

注意:我不必担心一次运行该应用程序的多个实例。内存锁定就足够了。

3 个答案:

答案 0 :(得分:3)

更新

Petricek建议使用比Async.StartChild更好的值,因此我将lazyDownload更改为asyncDownload


您可以将MailboxProcessor用作处理缓存的下载管理器。 MailboxProcessor是F#中的一种结构,它处理消息队列以确保没有冲突。

首先,您需要一个能够维持状态的处理器:

let stateFull hndl initState =
    MailboxProcessor.Start(fun inbox ->
        let rec loop state : Async<unit> = async {
            try         let! f        = inbox.Receive()
                        let! newState = f state
                        return! loop newState
            with e ->   return! loop (hndl e state)
        }
        loop initState
    )

第一个参数是错误处理程序,第二个参数是初始状态,在这种情况下为Map<string, Async<string>>。这是我们的downloadManager

let downloadManager = 
    stateFull (fun e s -> printfn "%A" e ; s) (Map.empty : Map<string, _>)

要调用邮箱,我们需要使用PostAndReply

let applyReplyS f (agent: MailboxProcessor<'a->Async<'a>>) = 
    agent.PostAndReply(fun (reply:AsyncReplyChannel<'r>) -> 
        fun v -> async {
            let st, r = f v
            reply.Reply r
            return st 
        })

该函数需要一个文件夹函数来检查缓存,如果找不到则添加Async<string>并返回更新的缓存。

首先使用asyncDownload函数:

let asyncDownload url = 
    async { 
        let started = System.DateTime.UtcNow.Ticks
        do! Async.Sleep 30
        let finished = System.DateTime.UtcNow.Ticks
        let r = sprintf "Downloaded  %A it took: %dms %s" (started / 10000L) ((finished - started) / 10000L) url
        printfn "%s" r
        return r
    }

只是一个伪函数,它返回字符串和计时信息。

现在使用文件夹功能检查缓存:

let folderCache url cache  =
    cache 
    |> Map.tryFind url
    |> Option.map(fun ld -> cache, ld)
    |> Option.defaultWith (fun () -> 
        let ld = asyncDownload url |> Async.StartChild |> Async.RunSynchronously
        cache |> Map.add url ld, ld
    )

最后我们的下载功能:

let downloadUrl url =
    downloadManager 
    |> applyReplyS (folderCache url)

// val downloadUrl: url: string -> Async<string>

测试

let s = System.DateTime.UtcNow.Ticks
printfn "started %A" (s / 10000L)
let res = 
    List.init 50 (fun i -> i, downloadUrl (string <| i % 5) )
    |> List.groupBy (snd >> Async.RunSynchronously)
    |> List.map (fun (t, ts) -> sprintf "%s - %A" t (ts |> List.map fst ) )

let f = System.DateTime.UtcNow.Ticks
printfn "finish  %A" (f / 10000L)

printfn "elapsed %dms" ((f - s) / 10000L)

res |> printfn "Result: \n%A"

产生以下输出:

started 63676683215256L
Downloaded  63676683215292L it took: 37ms "2"
Downloaded  63676683215292L it took: 36ms "3"
Downloaded  63676683215292L it took: 36ms "1"
Downloaded  63676683215291L it took: 38ms "0"
Downloaded  63676683215292L it took: 36ms "4"
finish  63676683215362L
elapsed 106ms
Result: 
["Downloaded  63676683215291L it took: 38ms "0" - [0; 5; 10; 15; 20; 25; 30; 35; 40; 45]";
 "Downloaded  63676683215292L it took: 36ms "1" - [1; 6; 11; 16; 21; 26; 31; 36; 41; 46]";
 "Downloaded  63676683215292L it took: 37ms "2" - [2; 7; 12; 17; 22; 27; 32; 37; 42; 47]";
 "Downloaded  63676683215292L it took: 36ms "3" - [3; 8; 13; 18; 23; 28; 33; 38; 43; 48]";
 "Downloaded  63676683215292L it took: 36ms "4" - [4; 9; 14; 19; 24; 29; 34; 39; 44; 49]"]

答案 1 :(得分:3)

我同意@AMieres的观点,邮箱处理器是执行此操作的好方法。我的代码版本不太通用-为此目的直接使用邮箱处理器,因此可能会更简单。

我们的邮箱处理器只有一条消息-您要求它下载一个URL,它为您提供了一个异步工作流,您可以等待获取结果:

XmlNode

我们需要一个辅助函数来异步下载URL:

type DownloadMessage = 
  | Download of string * AsyncReplyChannel<Async<string>>

在邮箱处理器中,我们保留了一个可变的let asyncDownload url = async { let wc = new System.Net.WebClient() printfn "Downloading: %s" url return! wc.AsyncDownloadString(System.Uri(url)) } (这很好,因为邮箱处理器是同步处理消息的)。收到下载请求时,我们检查缓存中是否已经有下载-如果没有,则以子cache的形式开始下载并将其添加到缓存中-因此缓存中包含代表以下结果的异步工作流程:正在下载。

async

要真正使用缓存下载,我们只向邮箱处理器发送一个请求,然后等待返回的工作流的结果(可能被多个请求共享)。

let downloadCache = MailboxProcessor.Start(fun inbox -> async {
  let cache = System.Collections.Generic.Dictionary<_, _>()
  while true do
    let! (Download(url, repl)) = inbox.Receive()
    if not (cache.ContainsKey url) then 
      let! proc = asyncDownload url |> Async.StartChild
      cache.Add(url, proc)
    repl.Reply(cache.[url]) })

答案 2 :(得分:2)

我为您提供了一个基于@Tomas Petricek答案的简化版本。


让我们假设我们具有下载功能,给定的URL返回Async<string>。这是一个虚拟版本:

let asyncDownload url = 
    async { 
        let started = System.DateTime.UtcNow.Ticks
        do! Async.Sleep 30
        let finished = System.DateTime.UtcNow.Ticks
        let r = sprintf "Downloaded  %A it took: %dms %s" (started / 10000L) ((finished - started) / 10000L) url
        printfn "%s" r
        return r
    }

在我们自己的模块中,我们有一些简单的通用Mailbox帮助函数:

module Mailbox =
    let iterA hndl f =
        MailboxProcessor.Start(fun inbox ->
            async {
                while true do
                    try       let!   msg = inbox.Receive()
                              do!  f msg
                    with e -> hndl e
            }
        )
    let callA hndl f = iterA hndl (fun ((replyChannel: AsyncReplyChannel<_>), msg) -> async {
        let! r = f msg
        replyChannel.Reply r
    })
    let call hndl f = callA hndl (fun msg -> async { return f msg } )

此“库”的目的是简化MailboxProcessor的更典型用法。尽管看起来很复杂且难以理解,但重要的是函数的功能以及如何使用它们。 特别是,我们将使用Mailbox.call来返回能够返回值的邮箱代理。它的签名是:

val call: 
   hndl: exn -> unit ->
   f   : 'a -> 'b    
      -> MailboxProcessor<AsyncReplyChannel<'b> * 'a>

第一个参数是异常处理程序,第二个参数是返回值的函数。这是我们定义downloadManager的方式:

let downloadManager = 
    let dict = new System.Collections.Generic.Dictionary<string, _>()
    Mailbox.call (printfn "%A") (fun url ->         
        if dict.ContainsKey url then dict.[url] else
        let result = asyncDownload url |> Async.StartChild |> Async.RunSynchronously
        dict.Add(url, result)
        result
    )

我们的缓存为Dictionary。如果没有网址,我们将调用asyncDownload并将其作为子进程启动。通过使用Async.StartChild,我们不必等到下载完成,只需返回一个async,等待它完成。

要调用管理器,我们使用downloadManager.PostAndReply

let downloadUrl url = downloadManager.PostAndReply(fun reply -> reply, url)

这是一个测试:

let s = System.DateTime.UtcNow.Ticks
printfn "started %A" (s / 10000L)
let res = 
    List.init 50 (fun i -> i, downloadUrl (string <| i % 5) )
    |> List.groupBy (snd >> Async.RunSynchronously)
    |> List.map (fun (t, ts) -> sprintf "%s - %A" t (ts |> List.map fst ) )

let f = System.DateTime.UtcNow.Ticks
printfn "finish  %A" (f / 10000L)

printfn "elapsed %dms" ((f - s) / 10000L)

res |> printfn "Result: \n%A"

产生:

started 63676682503885L
Downloaded  63676682503911L it took: 34ms 1
Downloaded  63676682503912L it took: 33ms 2
Downloaded  63676682503911L it took: 37ms 0
Downloaded  63676682503912L it took: 33ms 3
Downloaded  63676682503912L it took: 33ms 4
finish  63676682503994L
elapsed 109ms
Result: 
["Downloaded  63676682503911L it took: 37ms 0 - [0; 5; 10; 15; 20; 25; 30; 35; 40; 45]";
 "Downloaded  63676682503911L it took: 34ms 1 - [1; 6; 11; 16; 21; 26; 31; 36; 41; 46]";
 "Downloaded  63676682503912L it took: 33ms 2 - [2; 7; 12; 17; 22; 27; 32; 37; 42; 47]";
 "Downloaded  63676682503912L it took: 33ms 3 - [3; 8; 13; 18; 23; 28; 33; 38; 43; 48]";
 "Downloaded  63676682503912L it took: 33ms 4 - [4; 9; 14; 19; 24; 29; 34; 39; 44; 49]"]