FSharp:使用CSV类型提供程序异步

时间:2014-08-09 22:11:43

标签: f# type-providers f#-data

我正在使用csv类型提供程序从Azure blob存储上的一系列文件中收集一些数据:

#r "../packages/FSharp.Data.2.0.9/lib/portable-net40+sl5+wp8+win8/FSharp.Data.dll"
open FSharp.Data

type censusDataContext = CsvProvider<"https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/AK.TXT">
type stateCodeContext = CsvProvider<"https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/states.csv">

let stateCodes =  stateCodeContext.Load("https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/states.csv");

let fetchStateData (stateCode:string)=
        let uri = System.String.Format("https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/{0}.TXT",stateCode)
        censusDataContext.Load(uri).Rows

let usaData = stateCodes.Rows 
                |> Seq.collect(fun r -> fetchStateData(r.Abbreviation))
                |> Seq.length

我现在想要运行这些异步,我遇到了AsyncLoad的问题:

let fetchStateDataAsync(stateCode:string)=
    async{
        let uri = System.String.Format("https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/{0}.TXT",stateCode)
        let! stateData =  censusDataContext.AsyncLoad(uri)
        return stateData.Rows
    }

let usaData = stateCodes.Rows 
                |> Seq.collect(fun r -> fetchStateDataAsync(r.Abbreviation))
                |> Seq.length

错误消息是

The type 'Async<seq<CsvProvider<...>.Row>>' is not compatible with the type 'seq<'a>'

原谅我缺乏异步知识,但在应用异步函数时是否必须使用Seq.Collect以外的东西?

提前致谢

1 个答案:

答案 0 :(得分:4)

问题是将代码转换为异步(通过将其包装在async { .. }块中)会将结果从seq<Row>更改为Async<seq<Row>> - 也就是说,您现在获得异步计算最终将完成并返回序列。

要解决此问题,您需要以某种方式启动计算并等待结果。有许多选择 - 比如顺序逐个运行。可能最简单的选择(也许是最好的选择 - 取决于你想要做什么)是并行运行计算:

let getAll = 
  stateCodes.Rows 
  |> Seq.map(fun r -> fetchStateDataAsync(r.Abbreviation))
  |> Async.Parallel

这为您提供了一个异步计算,可以运行所有下载并返回一组结果。您可以同步运行(并阻止)并获得结果:

getAll |> Async.RunSynchronously
       |> Seq.collect id
       |> Seq.length

如果要在后台异步运行下载,可以执行此操作,但需要指定如何处理结果。例如:

async { 
  let! all = getAll
  all |> Seq.collect id |> Seq.length |> printfn "Length %d" }
|> Async.Start