优化最大Collat​​z序列

时间:2018-12-22 21:17:35

标签: erlang

我正在寻找针对Erlang中最大Collat​​z序列问题的解决方案。现在,我已经尝试使用ETS,下面的解决方案使用了地图,但是性能却比我想象的要差。也许我可以做一些优化来改善它?

[('1', 'The full citation, including the ICC registration reference of all designations and abbreviations used in \nthis judgment are included in Annex 1.'), ('2', 'A more detailed procedural history is set out in Annex 2 of this judgment. \nICC-01/04-02/12-271-Corr  07-04-2015  7/117  EK  A\n\n 8/117 \nrepresentatives, participate in the present appeal proceedings for the purpose of \npresenting their views and concerns in respect of their personal interests in the issues \non appeal".')]

1 个答案:

答案 0 :(得分:1)

好吧,首先让我们调整一下调用,这将使我们能够进行一些简单的统计并比较不同的方法

-export([start/2, max_collatz/2]).

...

max_collatz(N, M) ->
    Map = maps:new(),
    Map1 = maps:put(1, 1, Map),
    s(N, M, 0, Map1).

start(N, M)->
    {T, Result} = timer:tc( fun() ->  max_collatz(N, M) end),
    io:format("~p seconds~n", [T / 1000000]),
    Result.

让我们用Erlang惯用的方式写它

-module(collatz).

-export([start/2, max_collatz/2]).

collatz_next(N) when N rem 2 =:= 0 ->
    N div 2;
collatz_next(N) ->
    3 * N + 1.

collatz_length(N, Map) ->
    case Map of
        #{N := L} -> {L, Map};
        _ ->
            {L, Map2} = collatz_length(collatz_next(N), Map),
            {L + 1, Map2#{N => L + 1}}
    end.

max_collatz(N, M) ->
    Map = lists:foldl(fun(X, Map) -> {_, Map2} = collatz_length(X, Map), Map2 end,
                      #{1 => 1}, lists:seq(N, M)),
    lists:max(maps:values(Map)).

start(N, M) ->
    {T, Result} = timer:tc(fun() -> max_collatz(N, M) end),
    io:format("~p seconds~n", [T / 1000000]),
    Result.

然后我们可以使用例如eministat来比较速度。

克隆

git clone https://github.com/jlouis/eministat.git
cd eministat
make

如果您遇到类似问题

 DEPEND eministat.d
 ERLC   eministat.erl eministat_analysis.erl eministat_ds.erl eministat_plot.erl eministat_report.erl eministat_resample.erl eministat_ts.erl
compile: warnings being treated as errors
src/eministat_resample.erl:8: export_all flag enabled - all functions will be exported
erlang.mk:4940: recipe for target 'ebin/eministat.app' failed
make[1]: *** [ebin/eministat.app] Error 1
erlang.mk:4758: recipe for target 'app' failed
make: *** [app] Error 2

您可以修复

diff --git src/eministat_resample.erl src/eministat_resample.erl
index 1adf401..0887b2c 100644
--- src/eministat_resample.erl
+++ src/eministat_resample.erl
@@ -5,7 +5,7 @@
 -include("eministat.hrl").

 -export([resample/3, bootstrap_bca/3]).
--compile(export_all).
+-compile([nowarn_export_all, export_all]).

 %% @doc resample/3 is the main resampler of eministat
 %% @end

然后运行它

$ erl -pa eministat/ebin/
Erlang/OTP 21 [erts-10.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

Eshell V10.1  (abort with ^G)
1> c(collatzMaps), c(collatz).                                                                                                                                  
{ok,collatz}
2> eministat:x(95.0, eministat:s(orig, fun() -> collatzMaps:max_collatz(1, 100000) end, 30), eministat:s(new, fun() -> collatz:max_collatz(1, 100000) end, 30)).
x orig
+ new
+--------------------------------------------------------------------------+
|+    ++++++++ +++++   * +  +x+**+xxxx**x xxx xx+x xxx *x  x  +   x       x|
|        +   + +                   x x xx            x                     |
|        +                                                                 |
|                               |_______M___A__________|                   |
|      |________M_____A______________|                                     |
+--------------------------------------------------------------------------+
Dataset: x N=30 CI=95.0000
Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
Min:         1.76982e+5
1st Qu.      1.81610e+5
Median:      1.82954e+5
3rd Qu.      1.87030e+5
Max:         1.94944e+5
Average:     1.84280e+5 [      8.00350] (   1.82971e+5 ‥    1.85749e+5)
Std. Dev:       3999.87 [     -102.524] (      3128.74 ‥       5431.13)

Outliers: 0/0 = 0 (μ=1.84288e+5, σ=3897.35)
        Outlier variance:    3.22222e-2 (slight)

------

Dataset: + N=30 CI=95.0000
Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
Min:         1.69179e+5
1st Qu.      1.72501e+5
Median:      1.74614e+5
3rd Qu.      1.79850e+5
Max:         1.90638e+5
Average:     1.76517e+5 [      3.11862] (   1.74847e+5 ‥    1.78679e+5)
Std. Dev:       5343.46 [     -147.802] (      4072.99 ‥       7072.53)

Outliers: 0/0 = 0 (μ=1.76520e+5, σ=5195.66)
        Outlier variance:    9.43164e-2 (slight)

Difference at 95.0% confidence
        -7762.60 ± 2439.69
        -4.21240% ± 1.32391%
        (Student's t, pooled s = 4719.72)
------

ok

所以现在看来​​快了4%,但这并不多。首先,我们可以内联collatz_next/1,这实际上是您在collatz/2函数中所拥有的。我想具体一点,所以我将-export和第一个函数放在

之间
-compile({inline, [collatz_next/1]}).

效果很小

Difference at 95.0% confidence
        -9895.27 ± 5524.91
        -5.24520% ± 2.92860%
        (Student's t, pooled s = 1.06882e+4)

然后,我们可以像在lists:fold/2函数中一样尝试推出lists:seq/2lists:max/1s/4,但让我们更习惯地使用它。

max_collatz(N, M) ->
    max_collatz(N, M, 1, #{1 => 1}).

max_collatz(M, M, Max, _) -> Max;
max_collatz(N, M, Max, Map) ->
    case collatz_length(N + 1, Map) of
        {L, Map2} when L > Max ->
            max_collatz(N + 1, M, L, Map2);
        {_, Map2} ->
            max_collatz(N + 1, M, Max, Map2)
    end.

好多了,但还是不多

Difference at 95.0% confidence
        -1.78775e+4 ± 1980.35
        -9.66832% ± 1.07099%

现在,当我们删除所有外部代码调用时,值得尝试进行本机编译(外部函数调用通常会破坏任何本机编译优势)。我们也可以为HiPE添加一点类型提示,但似乎几乎没有任何效果(通常值得尝试使用浮点算法,但不是这种情况,并且大量使用map可能也会在这里引起问题)。

max_collatz(N, M) when N < M, is_integer(N), is_integer(M) ->
    max_collatz(N, M, 1, #{1 => 1}).

没什么好

c(collatz, [native]).
...
Difference at 95.0% confidence
        -2.26703e+4 ± 2651.32
        -12.1721% ± 1.42354%
        (Student's t, pooled s = 5129.13)

所以它的时间尝试变脏。不建议将过程词典存储在数据中,但是如果它在特殊过程中,则可以接受。

collatz_length(N) ->
    case get(N) of
        undefined -> 
            L = collatz_length(collatz_next(N)),
            put(N, L + 1),
            L + 1;
        L -> L
    end.

max_collatz(N, M) when N < M, is_integer(N), is_integer(M) ->
    P = self(),
    W = spawn_link(fun() ->
                           put(1, 1),
                           P ! {self(), max_collatz(N, M, 1)}
                   end),
    receive {W, Max} -> Max end.

max_collatz(M, M, Max) -> Max;
max_collatz(N, M, Max) ->
    case collatz_length(N + 1) of
        L when L > Max ->
            max_collatz(N + 1, M, L);
        _ ->
            max_collatz(N + 1, M, Max)
    end.

是的,它是肮脏但可行的解决方案,值得(即使没有native

Difference at 95.0% confidence
        -1.98173e+5 ± 5450.92
        -80.9384% ± 2.22628%
        (Student's t, pooled s = 1.05451e+4)

因此,在这里我们使用一些肮脏的技巧从3.6s下降到0.93s,但是无论如何,如果您要执行此类任务,则可能会使用用C编写的NIF。这不是Erlang所能实现的。

> collatzMaps:start(1, 1000000).
3.576669 seconds
525
> collatz:start(1, 1000000).                                                                                                                                   
0.931186 seconds
525