如何用F#成语提高性能

时间:2016-06-04 17:16:40

标签: performance f#

我正在使用this course on Machine-Learning同时学习F#。我做了以下作业exercise,这是第二周的第一次练习:

  

运行计算机模拟以翻转1,000个虚拟公平币。翻动   每枚硬币独立10次。关注3个硬币如下: c1   是第一枚硬币翻转, crand 是随机选择的硬币   1,000, cmin 是最低频率为的硬币   头(在领带的情况下选择较早的一个)。

     

ν1νrand   ,νmin是3个人获得的头部分   10次​​投掷中的硬币。按顺序运行实验100,000次   获得ν1,νrand和νmin的完整分布(注意c rand   并且c min将从运行变为运行。)

     

νmin的平均值是多少?

我已经生成了以下代码,它可以正常工作并给出正确的答案:

let private rnd = System.Random()
let FlipCoin() = rnd.NextDouble() > 0.5
let FlipCoinNTimes N = List.init N (fun _ -> FlipCoin())
let FlipMCoinsNTimes M N = List.init M (fun _ -> FlipCoinNTimes N)

let ObtainFrequencyOfHeads tosses = 
    let heads = tosses |> List.filter (fun toss -> toss = true)
    float (List.length (heads)) / float (List.length (tosses))

let GetFirstRandMinHeadsFraction allCoinsLaunchs = 
    let first = ObtainFrequencyOfHeads(List.head (allCoinsLaunchs))
    let randomCoin = List.item (rnd.Next(List.length (allCoinsLaunchs))) allCoinsLaunchs
    let random = ObtainFrequencyOfHeads(randomCoin)

    let min = 
        allCoinsLaunchs
        |> List.map (fun coin -> ObtainFrequencyOfHeads coin)
        |> List.min
    (first, random, min)

module Exercice1 = 
    let GetResult() = 
        Seq.init 100000 (fun _ -> FlipMCoinsNTimes 1000 10)
        |> Seq.map (fun oneExperiment -> GetFirstRandMinHeadsFraction oneExperiment)
        |> Seq.map (fun (first, random, min) -> min)
        |> Seq.average

但是,在我的机器上运行大约需要4分钟。我知道它做了很多工作,但我想知道是否有一些可以进行优化的修改。

当我正在尝试学习F#时,我要求使用F#惯用法进行优化,而不是将代码更改为C风格。

随意提出任何改进,风格,良好做法等。

[更新]

我已经编写了一些代码来比较提出的解决方案,它是可访问的here

结果如下:

  

基础 - 结果:0.037510,已用时间:00:00:55.1274883,改进:   0.99 x

     

Matthew Mcveigh - 结果:0.037497,已过去的时间:00:00:15.1682052,改进:3.61 x

     

Fyodor Soikin - 结果:0.037524,已过去的时间:00:01:29.7168787,改进:0.61 x

     

GuyCoder - 结果:0.037645,已过去时间:00:00:02.0883482,改善: 26.25 x

     

GuyCoder MathNet-结果:0.037666,已过去的时间:   00:00:24.7596117,改进:2.21 x

     

TheQuickBrownFox - 结果:   0.037494,已过去的时间:00:00:34.2831239,改进:1.60 x

关于改善时间的胜利者是GuyCoder,所以我会接受他的回答。但是,我发现他的代码更难理解。

3 个答案:

答案 0 :(得分:6)

预先分配大量列表是繁重的工作,算法可以在线处理,例如通过序列或递归。我将所有工作转换为尾递归函数以获得一些原始速度(将由编译器转换为循环)

不能保证100%正确,但希望能给你一个我要去的地方的要点

let private rnd = System.Random()
let flipCoin () = rnd.NextDouble() > 0.5

let frequencyOfHeads flipsPerCoin = 
    let rec countHeads numHeads i =
        if i < flipsPerCoin then
            let isHead = flipCoin ()
            countHeads (if isHead then numHeads + 1 else numHeads) (i + 1)
        else
            float numHeads

    countHeads 0 0 / float flipsPerCoin

let getFirstRandMinHeadsFraction numCoins flipsPerCoin = 
    let randomCoinI = rnd.Next numCoins

    let rec run first random min i =
        if i < numCoins then
            let frequency = frequencyOfHeads flipsPerCoin
            let first = if i = 0 then frequency else first
            let random = if i = randomCoinI then frequency else random
            let min = if min > frequency then frequency else min

            run first random min (i + 1)
        else
            (first, random, min)

    run 0.0 0.0 System.Double.MaxValue 0

module Exercice1 = 
    let getResult () = 
        let iterations, numCoins, numFlips = 100000, 1000, 10

        let getMinFromExperiment () =
            let (_, _, min) = getFirstRandMinHeadsFraction numCoins numFlips
            min

        let rec sumMinFromExperiments i sumOfMin =
            if i < iterations then
                sumMinFromExperiments (i + 1) (sumOfMin + getMinFromExperiment ())
            else
                sumOfMin

        let sum = sumMinFromExperiments 0 0.0
        sum / float iterations

答案 1 :(得分:4)

在我的计算机上运行代码并获得时间:

seconds: 68.481918
result: 0.47570994

在我的计算机上运行我的代码并获得时间:

seconds: 14.003861
vOne: 0.498963
vRnd: 0.499793
vMin: 0.037675

vMin最接近b 0.01

的正确答案

这几乎要快5x

我没有修改每种方法和数据结构来弄清楚为什么和有效,我只是用了几十年的经验来指导我。显然,不存储中间值而只存储结果是一个很大的改进。具体来说,coinTest只返回int的头数,而不是结果列表。另外,不是为每个硬币翻转获得随机数,而是为每个硬币获得随机数,然后使用该随机数的每个部分作为硬币翻转是有利的。这可以节省number of flips - 1个函数的调用。我也避免使用float值直到最后;我不认为在CPU上节省时间,但它确实简化了仅在int中思考的思考过程,这使我能够专注于其他效率。我知道这可能听起来很奇怪但是我越少考虑得到的答案就越好。我还只在必要时运行coinTest,例如只有第一枚硬币,只有随机硬币,并将所有尾巴作为退出条件。

namespace Workspace

module main =

    [<EntryPoint>]
    let main argv = 

        let rnd = System.Random()
        let randomPick (limit : int) : int = rnd.Next(limit)   // [0 .. limit) it's a Python habit

        let numberOfCoins = 1000
        let numberOfFlips = 10
        let numberOfExperiements = 100000

        let coinTest (numberOfFlips : int) : int =
            let rec countHeads (flips : int) bitIndex (headCount : int) : int =
                if bitIndex < 0 then headCount
                else countHeads (flips >>> 1) (bitIndex-1) (headCount + (flips &&& 0x01))
            countHeads (randomPick ((pown 2 numberOfFlips) - 1)) numberOfFlips 0

        let runExperiement (numberOfCoins : int) (numberOfFlips : int) : (int * int * int) =
            let (randomCoin : int) = randomPick numberOfCoins
            let rec testCoin coinIndex (cFirst, cRnd, cMin, cFirstDone, cRanDone, cMinDone) : (int * int * int) =
                if (coinIndex < numberOfCoins) then
                    if (not cFirstDone || not cRanDone || not cMinDone) then
                        if (cFirstDone && cMinDone && (coinIndex <> randomCoin)) then
                             testCoin (coinIndex+1) (cFirst, cRnd, cMin, cFirstDone, cRanDone, cMinDone)
                        else
                            let headsTotal = coinTest numberOfFlips 
                            let (cFirst, cRnd, cMin, cFirstDone, cRanDone, cMinDone) =
                                let cFirst = if coinIndex = 0 then headsTotal else cFirst
                                let cRnd = if coinIndex = randomCoin then headsTotal else cRnd
                                let cMin = if headsTotal < cMin then headsTotal else cMin
                                let cRanDone = if (coinIndex >= randomCoin) then true else cRanDone
                                let cMinDone = if (headsTotal = 0) then true else cMinDone
                                (cFirst, cRnd, cMin, true, cRanDone, cMinDone)
                            testCoin (coinIndex+1) (cFirst, cRnd, cMin, cFirstDone, cRanDone, cMinDone)
                    else
                        (cFirst, cRnd, cMin)
                else
                    (cFirst, cRnd, cMin)
            testCoin 0 (-1,-1,10, false, false, false)

        let runExperiements (numberOfExperiements : int) (numberOfCoins : int) ( numberOfFlips : int) =
            let rec accumateExperiements index aOne aRnd aMin : (int * int * int) =
                let (cOne,cRnd,cMin) = runExperiement numberOfCoins numberOfFlips
                if index > numberOfExperiements then (aOne, aRnd, aMin)
                else accumateExperiements (index + 1) (aOne + cOne) (aRnd + cRnd) (aMin + cMin)
            let (aOne, aRnd, aMin) = accumateExperiements 0 0 0 0
            let (vOne : double) = (double)(aOne) / (double)numberOfExperiements / (double)numberOfFlips
            let (vRnd : double) = (double)(aRnd) / (double)numberOfExperiements / (double)numberOfFlips
            let (vMin : double) = (double)(aMin) / (double)numberOfExperiements / (double)numberOfFlips
            (vOne, vRnd, vMin)

        let timeIt () = 
            let stopWatch = System.Diagnostics.Stopwatch.StartNew()
            let (vOne, vRnd, vMin) = runExperiements numberOfExperiements numberOfCoins numberOfFlips
            stopWatch.Stop()
            printfn "seconds: %f" (stopWatch.Elapsed.TotalMilliseconds / 1000.0)
            printfn "vOne: %A" vOne
            printfn "vRnd: %A" vRnd
            printfn "vMin: %A" vMin

        timeIt ()

        printf "Press any key to exit: "
        System.Console.ReadKey() |> ignore
        printfn ""

        0 // return an integer exit code

=============================================== =========================

这只是一个中间答案,因为我询问OP是否考虑使用MathNet Numerics惯用F#而OP想看看它是什么样的。在我的机器上运行他的版本和第一个剪切版本后,OP版本更快。 OP:75秒,我的:84秒

namespace Workspace

open MathNet.Numerics.LinearAlgebra

module main =

    [<EntryPoint>]
    let main argv = 

        let rnd = System.Random()
        let flipCoin() = 
            let head = rnd.NextDouble() > 0.5
            if head then 1.0 else 0.0

        let numberOfCoins = 1000
        let numberOfFlips = 10
        let numberOfExperiements = 100000
        let numberOfValues = 3

        let randomPick (limit : int) : int = rnd.Next(limit)   // [0 .. limit) it's a Python habit
        let headCount (m : Matrix<float>) (coinIndex : int) : int = 
            System.Convert.ToInt32((m.Row coinIndex).Sum())

        let minHeads (m : Matrix<float>) (numberOfCoins : int) (numberOfFlips : int) : int =
            let rec findMinHeads currentCoinIndex minHeadsCount minHeadsIndex =
                match currentCoinIndex,minHeadsCount with
                | -1,_ -> minHeadsCount
                | _,0 -> minHeadsCount  // Can't get less than zero so stop searching.
                | _ ->
                    let currentMinHeadCount = (headCount m currentCoinIndex)
                    let nextIndex = currentCoinIndex - 1
                    if currentMinHeadCount < minHeadsCount 
                    then findMinHeads nextIndex currentMinHeadCount currentCoinIndex
                    else findMinHeads nextIndex minHeadsCount minHeadsIndex
            findMinHeads (numberOfCoins - 1) numberOfFlips -1

        // Return the values for cOne, cRnd, and cMin as int values. 
        // Will do division on final sum of experiments instead of after each experiment.
        let runExperiement (numberOfCoins : int) (numberOfFlips : int) : (int * int * int) =        
            let (flips : Matrix<float>) = DenseMatrix.init numberOfCoins numberOfFlips (fun i j -> flipCoin())
            let cOne = headCount flips 0
            let cRnd = headCount flips (randomPick numberOfCoins)
            let cMin = minHeads flips numberOfCoins numberOfFlips
            (cOne,cRnd,cMin)

        let runExperiements (numberOfExperiements : int) (numberOfCoins : int) (numberOfFlips : int) : (int [] * int [] * int []) =
            let (cOneArray : int[]) = Array.create numberOfExperiements 0
            let (cRndArray : int[]) = Array.create numberOfExperiements 0
            let (cMinArray : int[]) = Array.create numberOfExperiements 0
            for i = 0 to (numberOfExperiements - 1) do
                let (cOne,cRnd,cMin) = runExperiement numberOfCoins numberOfFlips
                cOneArray.[i] <- cOne 
                cRndArray.[i] <- cRnd 
                cMinArray.[i] <- cMin 
            (cOneArray, cRndArray, cMinArray)

        let (cOneArray, cRndArray, cMinArray) = runExperiements numberOfExperiements numberOfCoins numberOfFlips
        let (vOne : double) = (double)(Array.sum cOneArray) / (double)numberOfExperiements / (double)numberOfFlips
        let (vRnd : double) = (double)(Array.sum cRndArray) / (double)numberOfExperiements / (double)numberOfFlips
        let (vMin : double) = (double)(Array.sum cMinArray) / (double)numberOfExperiements / (double)numberOfFlips

        printfn "vOne: %A" vOne
        printfn "vRnd: %A" vRnd
        printfn "vMin: %A" vMin

在编码过程中,我意识到我可以仅使用int进行所有计算,只有最后一次计算才能生成需要floatdouble的百分比即便如此,那只是因为答案清单是一个百分比;理论上,数字可以比较为int以获得相同的理解。如果我只使用int,那么我必须创建一个int矩阵类型,这比我想做的更多。当我有时间时,我会将MathNet矩阵切换到F#Array2D或类似的东西并检查。请注意,如果您使用MathNet对此进行标记,那么MathNet的维护者可能会回答(Christoph Rüegg

我对此方法进行了更改,它的速度提高了5秒。

// faster
let minHeads (m : Matrix<float>) (numberOfCoins : int) (numberOfFlips : int) : int =
    let (mins : float[]) = m.FoldByRow((fun (x : float) y -> x + y), 0.0)
    let (minHead : float) = Array.min mins
    System.Convert.ToInt32(minHead)

答案 2 :(得分:3)

我试图找到代码中可能的最小更改,以使其更快。

我发现最大的性能提升是更改ObtainFrequencyOfHeads函数,使其计算集合中的true值,而不是创建中间过滤集合,然后计算它。我是通过使用fold

完成的
let ObtainFrequencyOfHeads tosses = 
    let heads = tosses |> List.fold (fun state t -> if t then state + 1 else state) 0
    float heads / float (List.length (tosses))

另一项改进来自将所有列表更改为数组。这就像用List.替换Array.的每个实例(包括上面的新函数)一样简单。

有些人可能会说这不太实用,因为它使用的是可变集合而不是不可变集合。但是,我们不会改变任何数组,只是使用它们创建便宜,检查长度和按索引查找的事实。我们已经取消了对突变的限制,但我们仍然没有使用突变。如果需要,使用数组来提高性能当然是惯用的。

通过这两项变更,我在FSI上的性能提升了近两倍。