Question

我想知道在尝试使用MATLAB的写时复制（懒惰复制）机制从单元格数组中的多个单元格链接相同的大矩阵时，是否有办法确定我是否正确行事

例如：

img = randn(500);
[dx,dy] = gradient(img);
S = cell(2,2);
S{1,1} = dx.^2;
S{2,2} = dy.^2;
S{1,2} = dx.*dy;
S{2,1} = S{1,2};  % should be a reference, as long as not modified

但是看看whos的输出：

>> whos
  Name        Size               Bytes  Class     Attributes

  S           2x2              8000448  cell                
  dx        500x500            2000000  double              
  dy        500x500            2000000  double              
  img       500x500            2000000  double

我希望看到S占用6 MB，而不是8 MB。

有没有办法验证程序中是否存在错误，并且这两个单元格仍然在最后引用相同的数组？

我知道函数memory，但遗憾的是它只适用于Windows平台（我在MacOS上）。

Answer 1

验证两个特定阵列实际共享数据的一种可能解决方案是使用从Yair's Undocumented MATLAB Blog修改的以下MEX文件：

#include "mex.h"
#include <cstdint>
void mexFunction( int /*nlhs*/, mxArray* plhs[], int nrhs, mxArray const* prhs[]) {
   if (nrhs < 1) mexErrMsgTxt("One input required.");
   plhs[0] = mxCreateNumericMatrix(1, 1, mxUINT64_CLASS, mxREAL);
   std::uint64_t* out = static_cast<std::uint64_t*>(mxGetData(plhs[0]));
   out[0] = reinterpret_cast<std::uint64_t>(mxGetData(prhs[0]));
}

将其保存为getaddr.cpp并使用

进行编译

mex getaddr.cpp

允许以下测试：

img = randn(500);
[dx,dy] = gradient(img);
S = cell(2,2);
S{1,1} = dx.^2;
S{2,2} = dy.^2;
S{1,2} = dx.*dy;
S{2,1} = S{1,2};  % should be a reference, as long as not modified

assert(getaddr(S{1,2}) == getaddr(S{2,1}))

这与获取struct S实际使用的内存摘要（我仍然认为有用）不同，但它确实允许验证内存是否共享。

Answer 2

“有没有办法验证程序中是否存在错误，并且这两个单元格仍然在末尾引用相同的数组？”

我会尝试测量花费多少时间。由于复制指针比复制数据更快，因此它应该以不同的方式缩放。

这显示了不同之处：

i=500:500:5000;
t=zeros(2,length(i));
for ct=1:length(i)
    img = randn(i(ct));
    [dx,dy] = gradient(img);
    S = cell(2,2);
    S{1,1} = dx.^2;
    S{2,2} = dy.^2;
    S{1,2} = dx.*dy;
    tic;
    S{2,1} = S{1,2};  % should be a reference, as long as not modified
    t(1,ct)=toc;
    tic
    S{2,1} = S{1,2}+1; 
    t(2,ct)=toc;
end
B=(i.^2)*8;
figure(1);clf
subplot(1,2,1);
plot(t(1,:),B,'.')
xlabel('time(s)');ylabel('Bytes');
title(sprintf('reference: no relation'))

subplot(1,2,2);
a=sum(B.*t(2,:))/sum(t(2,:).^2);
plot(t(2,:),B,'.',t(2,:),a*t(2,:))
xlabel('time(s)');ylabel('Bytes');
title(sprintf('datacopy: %.2f GB/s',a/1E9))

所以这个程序没有错误。 Matlab为单元格提供了错误的内存使用。

mex文件和内存

所以我读了这篇文章：http://undocumentedmatlab.com/blog/matlabs-internal-memory-representation

在matlab 2018a中我无法复制结果。 printmem适用于从format debug获得的指针，但getaddr和printaddr不再提供相同的指针。

A=1:10
>Structure address = 7d9a3eb0
>m = 1
>n = 10
>pr = 74ed5f20
printaddr(A)
>000000007D894640

将其作为printaddr：

/* printaddr.cpp */
#include "mex.h"
void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
   if (nrhs < 1) mexErrMsgTxt("One input required.");
   printf("%p\n", prhs[0]);
}

Answer 3

编辑：

在编辑答案之前，我使用了一个具有意外行为的未记录的函数，并且它的签名在不同版本的MATLAB之间不稳定所以我在这里提供了@CrisLuengo答案的扩展版本。

我们可以使用哈希映射来存储递归函数mxArray中数据元素及其关联的check_shared的唯一地址，并获取数据大小。请注意，在这里我们可以检查单元格中的共享状态，并且我们无法检查单元格之外的元素以及与单元格元素具有相同地址。*

#include "mex.h"
#include <unordered_map>
typedef std::unordered_map<void *,const mxArray *> TableType;

TableType check_shared(const mxArray* arr, TableType table = TableType())
{
    switch (mxGetClassID(arr)) {
        case mxCELL_CLASS:
            for(int i = 0; i < mxGetNumberOfElements (arr); i++) {
                table  = check_shared(mxGetCell (arr,i), std::move(table));
            }
            break;
        case mxSTRUCT_CLASS:
            for (int i = 0; i < mxGetNumberOfFields (arr); i++) {
                for (int j = 0; j < mxGetNumberOfElements (arr); j++) {
                    table = check_shared(mxGetFieldByNumber (arr, j, i), std::move(table));
                }
            }
            break;
        case mxVOID_CLASS:
        case mxFUNCTION_CLASS:
        case mxUNKNOWN_CLASS:
            return table;
    }
    if (!mxIsEmpty (arr)) {
        void* data = mxGetData(arr);
        table[data] = arr;
    }
    return table;
}
uint64_t actual_size(const TableType& table)
{
    uint64_t sz = 0;
    for (const auto& entry : table) {
        const mxArray * arr = entry.second;
        sz += mxGetElementSize (arr) * mxGetNumberOfElements (arr);
    }
    return sz;
}

void mexFunction(int nlhs, mxArray *plhs[],
                 int nrhs, const mxArray *prhs[])
{
    TableType table = check_shared(prhs[0]);
    plhs[0] = mxCreateNumericMatrix(1,1, mxUINT64_CLASS, mxREAL );
    uint64_t* result = static_cast<uint64_t*>(mxGetData (plhs[0]));
    result[0] = actual_size(table);
}

（*）支持基本数据类型，如cell，struct和数字数组。对于未知数据结构和classdef对象，函数返回零。

如何在MATLAB中查看变量使用的实际内存？

3 个答案:

mex文件和内存