F#管道函数执行流程

时间:2019-03-14 15:40:23

标签: f#

我是F#新手,以下代码是从CSV文件(评级> 9.0)获取所有行,然后输出到新文件。

但是我很难确定要完成工作需要多少个for循环,是否所有步骤都是在一个for循环中完成,还是在5 for for循环中完成,如下所述?

如果需要5个for循环才能完成工作,则应该在完成第2个循环后将整个文件加载到内存中,但是该过程在整个时间内仅消耗14M内存,比csv文件少。

我知道StreamReader不会一次将整个文件加载到内存中,但是以下代码如何执行?

提前谢谢...

let ratings = @"D:\Download\IMDB\csv\title.ratings.csv"
let rating9 = @"D:\Download\IMDB\csv\rating9.csv"

let readCsv reader =
    Seq.unfold (fun (r:StreamReader) ->         // 1st for loop
        match r.EndOfStream with
        | true -> None
        | false -> Some (r.ReadLine(), r)) reader

let toTuple = fun (s:string) ->
    let ary = s.Split(',')
    (string ary.[0], float ary.[1], int ary.[2])           

using (new StreamReader(ratings)) (fun sr ->
    use sw = new StreamWriter(rating9)
    readCsv sr
    |> Seq.map toTuple                          // 2nd for loop
    |> Seq.filter (fun (_, r, _) -> r > 9.0)    // 3rd for loop
    |> Seq.sortBy (fun (_, r, _) -> r)          // 4th for loop
    |> Seq.iter (fun (t, r, s) ->               // 5th for loop
        sw.WriteLine(sprintf "%s,%.1f,%i" t r s)))

1 个答案:

答案 0 :(得分:6)

您了解的缺失部分是F#的Seq 懒惰 。它不会做比所需更多的工作,尤其是,在绝对必要之前,它不会消耗序列。特别是Seq.mapSeq.filter的作用不像for循环;相反,它们就像转换管道一样,在现有转换之上堆叠了一个新转换。实际上,整个外观都将运行的代码的第一部分是Seq.sortBy(因为对序列进行排序需要知道其所有值是什么,因此Seq.sortBy必须消耗整个序列才能完成其工作)。到那时,Seq.filter步骤已经发生,因此CSV文件的很多行都被抛出了,这就是为什么程序消耗的内存少于原始文件总大小的原因。

这是Seq懒惰的实际示例,键入到F#Interactive提示符中。观看:

> let s = seq {1..20} ;;
val s : seq<int>

> let t = s |> Seq.map (fun i -> printfn "Starting with %d" i; i) ;;
val t : seq<int>

> let u = t |> Seq.map (fun i -> i*2) ;;
val u : seq<int>

> let v = u |> Seq.map (fun i -> i - 1) ;;
val v : seq<int>

> let w = v |> Seq.filter (fun i -> i > 10) ;;
val w : seq<int>

> let x = w |> Seq.sortBy id ;;
val x : seq<int>

> let y = x |> Seq.iter (fun i -> printfn "Result: %d" i) ;;
Starting with 1
Starting with 2
Starting with 3
Starting with 4
Starting with 5
Starting with 6
Starting with 7
Starting with 8
Starting with 9
Starting with 10
Starting with 11
Starting with 12
Starting with 13
Starting with 14
Starting with 15
Starting with 16
Starting with 17
Starting with 18
Starting with 19
Starting with 20
Result: 11
Result: 13
Result: 15
Result: 17
Result: 19
Result: 21
Result: 23
Result: 25
Result: 27
Result: 29
Result: 31
Result: 33
Result: 35
Result: 37
Result: 39
val y : unit = ()

> let z = w |> Seq.iter (fun i -> printfn "Result: %d" i) ;;
Starting with 1
Starting with 2
Starting with 3
Starting with 4
Starting with 5
Starting with 6
Result: 11
Starting with 7
Result: 13
Starting with 8
Result: 15
Starting with 9
Result: 17
Starting with 10
Result: 19
Starting with 11
Result: 21
Starting with 12
Result: 23
Starting with 13
Result: 25
Starting with 14
Result: 27
Starting with 15
Result: 29
Starting with 16
Result: 31
Starting with 17
Result: 33
Starting with 18
Result: 35
Starting with 19
Result: 37
Starting with 20
Result: 39
val z : unit = ()

请注意,尽管Seq.sortBy需要消耗整个列表来完成其工作,但由于在创建序列Seq时没有要求x的任何部分,因此它没有实际上并没有开始遍历这些值。实际上只有使用y的序列zSeq.iter触发了所有值的运行。 (但是您可以看到,在y步骤可以运行之前,sortBy步骤必须完整运行iter步骤,而z却没有{{ 1}}步骤,每个值一次都一次通过转换管道,并且只有在每个值被完全处理后,下一个值才开始被处理。