Question

我正在尝试运行其他使用splitapply function的Matlab代码，该代码仅在R2018a中提供。我目前正在使用R2015a;是否有一个简单的（尽管效率较低）替代实现，它实现了可以（临时）使用的相同目的？

Answer 1

记录的splitapply用法也依赖于findgroups。这两个都在R2015b ^[1]中实现。

您可以使用unique的第三个输出代替findgroups，并使用一个简单的循环代替splitapply。这是一个假设data是列向量的示例，您可以轻松地将其调整为矩阵数据。

% With splitapply
g = findgroups( data );           
m = splitapply( @mean, data, g ); % Your function in place of mean here

% Without splitapply (pre-R2015b)
[~, ~, g] = unique( data ); % Get group indices
m = zeros(max(g), 1);       % Initialise the output matrix
for ii = 1:max(g)
    m(ii) = mean( data( g == ii ) ); % Your function in place of mean here
end

通过一些快速测试，我发现这些方法与合理大小的阵列上的速度相当。对于data中的~100个组和~1e6个元素，我发现循环方法慢了4倍，但仍然很快。

^{[1]注意：MathWorks文档默认使用最新版本，这就是您认为在R2018a中引入splitapply的原因。但是，在每个函数的doc页面的底部，它说明了它何时被引入。在splitapply的情况下，我们看到＆＃34;在R2015b＆＃34;中引入。}

Answer 2

实际上，R2015b已经引入了splitapply。

正如splitapply文档中所述，函数combines two steps in the Split-Apply-Combine Workflow

以下图片（来自splitapply在线文档描述了该过程：

基本上splitapply使用函数findgroups对输入数据进行分组，然后将函数应用于每组数据。

遗憾的是，在2015年的R2015中也引入了findgroups，因此主要问题是找到实现它的方法。

要实现findgroups的“常规”版本，可能需要大量时间才能使其能够处理多种不同类型的数据集。

刚开始，您可以开始以与您必须使用的特定数据集匹配的形式实现它。

基本上，您可以使用unique函数实现其简化版本。

想法是使用它来检索：

数据集中唯一条目的列表：这些条目将是 groups
与这些组对应的数据集中条目的索引

获得数据集中组的索引后，可以使用它们来下选数据集的值，并将其用作需要应用的函数的输入。

在下文中，您可以找到一个可能的实现示例，该实现再现了splitapply的联机帮助中提供的示例。

当然，这不是使用“每个”数据集的“通用”实现，实际上它适用于示例的特定输入，但是，我希望，它可能是一个起点。

splitapply的在线示例

Excerpt of the on-line documentation

load patients
meanBMIFcn = @(h,w)mean((w ./ (h.^2)) * 703);
DT = table(Height,Weight);
GT = table(Gender,Smoker);
[G,results] = findgroups(GT);
meanBMI = splitapply(meanBMIFcn,DT,G);
results.meanBMI = meanBMI

<强>输出

results=4×3 table
     Gender     Smoker    meanBMI
    ________    ______    _______

    'Female'    false     21.672 
    'Female'    true      21.669 
    'Male'      false     26.578 
    'Male'      true      26.458

可能的实施

clear w

% Find the unique entries in the first dataset
[uni_list_1,~,uni_idx_1]=unique(Gender)
n_group_1=length(uni_list_1)

% Find the unique entries in the second dataset
[uni_list_2,~,uni_idx_2]=unique(Smoker)
n_group_2=length(uni_list_2)

% Get the indices of the occurrencies of the combinatin of the two
% entities
for g1=1:length(uni_list_1)
   for g2=1:length(uni_list_2)
      data_set.(uni_list_1{g1}).(['cond_' num2str(uni_list_2(g2))])=(uni_idx_1 == g1) & (uni_idx_2 == g2)
   end
end

% Define the function to be applied
meanBMIFcn = @(h,w)mean((w ./ (h.^2)) * 703);

% Extract the data matching the desired conditions and use them as input to
% the disired function
for g1=1:length(uni_list_1)
   for g2=1:length(uni_list_2)
      height=Height(data_set.(uni_list_1{g1}).(['cond_' num2str(uni_list_2(g2))]));
      weight=Weight(data_set.(uni_list_1{g1}).(['cond_' num2str(uni_list_2(g2))]));
      result.data_set.(uni_list_1{g1}).(['cond_' num2str(uni_list_2(g2))])=meanBMIFcn(height,weight)
   end
end

<强>输出

输出采用struct的形式，其字段是组和附加条件

>> result
result = 
    data_set: [1x1 struct]
>> result.data_set
ans = 
    Female: [1x1 struct]
      Male: [1x1 struct]
>> result.data_set.Female
ans = 
    cond_0: 21.6721
    cond_1: 21.6686
>> result.data_set.Male
ans = 
    cond_0: 26.5775
    cond_1: 26.4584

Answer 3

您可以在 statistics 工具箱中查看grpstats。

在Matlab中替代splitapply

3 个答案: