Question

我在MATLAB中有很重要的功能：

function [out] = f ( in1, in2, in3)

使用相同的参数经常调用它。该函数是确定性的，因此对于给定的输入参数，其输出将始终相同。

在函数中存储计算输入结果的最简单方法是什么？如果使用相同的输出再次调用该函数，它将能够快速回答？

是一个持久变量，它将（使用containers.Map或其他类）输入集<in1, in2, in3>映射到结果的路径？

请注意，在我的应用程序中，任何需要将数据保存到磁盘的方法都是不可能的。

Answer 1

以下是CacheableFunction类的想法

似乎你的主要问题的所有答案都指向了相同的方向 - 持久的Map是缓存结果的共识方式，我也是这样做的。
如果输入是数组，则需要将它们散列为字符串或标量以用作映射键。有很多方法可以将3个输入数组哈希到一个键，我在下面的解决方案中使用了DataHash。
我选择让它成为一个类，而不是像memoize这样的函数，这样输入散列函数可以动态指定一次，而不是硬编码。
根据输出的形式，它还使用dzip/dunzip来减少已保存输出的占用空间。
潜在的改进：一种聪明的方法，可以在内存占用达到某个限制时决定从持久性地图中删除哪些元素。

班级定义

classdef CacheableFunction < handle
    properties
        exeFun
        hashFun
        cacheMap
        nOutputs
        zipOutput
    end

    methods
        function obj = CacheableFunction(exeFun, hashFun, nOutputs)
            obj.exeFun = exeFun;
            obj.hashFun = hashFun;
            obj.cacheMap = containers.Map;
            obj.nOutputs = nOutputs;
            obj.zipOutput = [];
        end

        function [result] = evaluate(obj, varargin)

            thisKey = obj.hashFun(varargin);

            if isKey(obj.cacheMap, thisKey)
                if obj.zipOutput
                    result = cellfun(@(x) dunzip(x), obj.cacheMap(thisKey), 'UniformOutput', false);
                else
                    result = obj.cacheMap(thisKey);
                end
            else
                [result{1:obj.nOutputs}] = obj.exeFun(varargin);

                if isempty(obj.zipOutput)
                    obj.zipCheck(result);
                end

                if obj.zipOutput
                    obj.cacheMap(thisKey) = cellfun(@(x) dzip(x), result, 'UniformOutput', false);
                else
                    obj.cacheMap(thisKey) = result;
                end
            end
        end


        function [] = zipCheck(obj,C)
            obj.zipOutput = all(cellfun(@(x) isreal(x) & ~issparse(x) & any(strcmpi(class(x), ...
                {'double','single','logical','char','int8','uint8',...
                 'int16','uint16','int32','uint32','int64','uint64'})), C));
        end

    end
end

测试出来......

function [] = test_caching_perf()

A = CacheableFunction(@(x) long_annoying_function(x{:}), @(x) DataHash(x), 3);

B = rand(50, 50);
C = rand(50, 50);
D = rand(50, 50);

tic;
myOutput = A.evaluate(B, C, D);
toc

tic;
myOutput2 = A.evaluate(B, C, D);
toc

cellfun(@(x, y) all(x(:) == y(:)), myOutput, myOutput2)

end

function [A, B, C] = long_annoying_function(A, B, C)

    for ii = 1:5000000
        A = A+1;
        B = B+2;
        C = C+3;
    end
end

结果

>> test_caching_perf
Elapsed time is 16.781889 seconds.
Elapsed time is 0.011116 seconds.
ans =
    1     1     1

Answer 2

MATLAB现在附带了一个仅用于此目的的功能。使用的技术称为“memoization”，函数名称为“memoize”。

退房：https://www.mathworks.com/help/matlab/ref/memoize.html

Answer 3

持久映射确实是实现缓存结果的好方法。我能想到的优点：

无需为每种数据类型实现哈希函数。
Matlab矩阵是写时复制，可以提供一定的内存效率。
如果内存使用有问题，可以控制要缓存的结果数量。

有一个文件交换提交，由David Young提供A multidimensional map class，附带一个函数memoize（）正是这样做的。它的实现使用了一种不同的机制（引用的局部变量），但这个想法大致相同。与每个函数内部的持久映射相比，这个memoize（）函数允许现有函数被记忆而无需修改。正如Oleg所指出的，使用DataHash（或等效的）可以进一步减少内存使用量。

PS：我已经广泛使用了MapN类，它非常可靠。实际上我已经提交了一份错误报告，作者及时修复了它。

缓存函数的最简洁方法是MATLAB

3 个答案: