Question

我有一个类似于以下的数据集：

bthd = sort(floor(1+(10-1).*rand(10,1)));
bthd2 = sort(floor(1+(10-1).*rand(10,1)));
bthd3 = sort(floor(1+(10-1).*rand(10,1)));

Depth = [bthd;bthd2;bthd3];
Jday = [repmat(733774,10,1);repmat(733775,10,1);repmat(733776,10,1)];

temp = 10+(30-10).*rand(30,1);

Data = [Jday,Depth,temp];

其中我有一个类似于'Data'的矩阵，其中第一列是Julian Date，第二列是深度，第三列是温度。我想找出每个独特Jday的第一个和最后一个值是什么。这可以通过以下方式获得：

Data = [Jday,Depth,temp];

[~,~,b] = unique(Data(:,1),'rows');

for j = 1:length(unique(b));
    top_temp(j) = temp(find(b == j,1,'first'));
    bottom_temp(j) = temp(find(b == j,1,'last'));
end

但是，我的数据集非常大，使用此循环会导致运行时间过长。任何人都可以建议使用矢量化解决方案吗？

Answer 1

使用diff：

% for example
Jday = [1 1 1 2 2 3 3 3 5 5 6 7 7 7];
last = find( [diff(Jday) 1] );
first = [1 last(1:end-1)+1];
top_temp = temp(first) ;
bottom_temp = temp(last);

请注意，此解决方案假定Jday已排序。如果不是这种情况，您可以在建议的程序之前sort Jday。

Answer 2

您应该可以使用unique函数的出现选项来完成此操作：

[~, topidx, ~] = unique(Data(:, 1), 'first', 'legacy');
[~, bottomidx, ~] = unique(Data(:, 1), 'last', 'legacy');

top_temp = temp(topidx);
bottom_temp = temp(bottomidx);

如果您使用的是MATLAB R2013a，则需要遗留选项。如果您正在运行R2012b或更早版本，则应该可以将其删除。

找到独特的朱利安日期的第一个和最后一个值

2 个答案: