Question

我从mysql读取下表到matlab单元格数组：

Nyse = fetch(conn,'SELECT ticker,date,utcsec,bid,ofr FROM HFE.Quotes where ex="N" order by utcsec,bid;');

Nyse单元阵列包含1000000行。我想计算每秒的中位数出价，其中第二个在utcsec列中记录为字符串。我是通过以下方式完成的：

utcsec=cell2mat(Nyse(:,3));
bid=cell2mat(Nyse(:,4));
NyseBid=grpstats(bid,utcsec,{'median'});

问题是函数grpstats需要大约70秒来完成任务。问题是，如何优化代码以使其运行得更快？

UTCSEC列中的示例字符串是＆＃39; 09：30：00＆＃39;。

Answer 1

我建议您结帐this question and these answers因为这是一个高度相关的问题。

要将该线程的结果应用于此问题，我将使用this MEX function我写的，其中包含一组groupID并提取每个组所在的行。这允许按组进行有效聚合。

据我所知，utcsec本质上是一个groupID，bid是要聚合的数组。代码将：

utcsec = Nyse(:,3);    %utcsec in this should be a cell array o fstrings
[unique_utcsec, map] = mg_getRowsWithKey(utcsec);  %call to my magic function
         %unique_utcsec contains unique strings in utcsec
         %map shows us which rows correspond to each unique second

median_bid = zeros(length(unique_utcsec), size(bid,2));

for i = 1:length(unique_utcsec)  %iterate over each utc second
    median_bid(i,:) = median(bid(map{i},:),1);  %calculate the median for that second
end

在我的测试中，这段代码比使用grpstats函数的Matlab实现快得多。该线程中还有其他方法，不依赖于mex。 mex c ++代码应编译为：

mex -largeArrayDims mg_getRowsWithKey.cpp

然后可以像任何Matlab函数一样调用函数mg_getRowsWithKey。 mg_getRowsWithKey使用STL库（如map。

）以c ++编写

按列分组并计算matlab中组的中位数

1 个答案: