Question

作为MATLAB中的新手，我发现自己总是采用类似C ++的循环，而不是利用基于矩阵的操作。我今天遇到了另一个问题。

假设我有两个双列表（单元格数组），第一列是用户ID，第二列是相应的值。但是，它们在行中具有不同的大小。即。

Table 1

1 2.56
2 7.4
3 7.7
...
100 83.4

Table 2

1 7.1
3 1.4
4 4.4
...
76 7.2

尽管行大小不同，但这两个表实际上共享一些公共ID。 现在我希望形成一个大小为Nx3的新单元格数组，其中N是公共ID的数量，第一列和第二列是表1和表2中的值，分别。即。

New Table

1 2.56 7.1
3 7.7 1.4
...

同样，我可以使用循环来完成它，但我真的希望学习MATLAB这样做的方法。

Answer 1

我个人认为使用循环很好。如何使用intersect？

A = [1 2.56;2 7.4;3 7.7];
B = [1 7.1; 3 1.4;4 4.4];

[C, ia, ib] = intersect(A(:,1),B(:,1));

D = [A(ia,:), B(ib,2:end)]

Answer 2

查看ismember，这对于排序输入非常有效，因为在这种情况下索引是：

A = [1 2.56;
     2  7.4;
     3  7.7];
B = [1  7.1;
     3  1.4;
     4  4.4];

[tf,locb] = ismember(A(:,1),B(:,1))

您只需要这样两个输出：tf是A中的行地图，B中也存在，locb是{{1}中的位置对于B中的每个元素（即与A相同的长度，在没有匹配的情况下为零）。

因此，一个常见的习惯用法是A与locb索引：

tf

在输入排序时，请考虑>> C = [A(tf,:) B(locb(tf),2:end)] C = 1.0000 2.5600 7.1000 3.0000 7.7000 1.4000和ismember之间的速度差异。小数据：

intersect

虽然N = 1e5; A = [(1:N).' rand(N,1)]; B = [(1:N).' rand(N,1)]; >> tic; [tf,locb] = ismember(A(:,1),B(:,1)); toc Elapsed time is 0.013419 seconds. >> tic; [C, ia, ib] = intersect(A(:,1),B(:,1)); toc Elapsed time is 0.050618 seconds.的速度要快几倍，但小数据没有什么大优势。但是，对于大型排序数据集，请使用ismember：

ismember

注意：如果您想真正利用先验有关已排序输入的知识，可以undocumented function called ismembc跳过对N = 1e7; A = [(1:N).' rand(N,1)]; B = [(1:N).' rand(N,1)]; >> tic; [tf,locb] = ismember(A(:,1),B(:,1)); toc Elapsed time is 0.892977 seconds. >> tic; [C, ia, ib] = intersect(A(:,1),B(:,1)); toc Elapsed time is 5.925537 seconds.的调用，使其更快比issorted。另请参阅here。

Answer 3

这可以通过非常通用的bsxfun函数来解决：

C1 = {1 2.56; 2 7.4; 3 7.7; 100 83.4};
C2 = {1 7.1; 3 1.4; 4 4.4; 76 7.2}; %// example data. Two-column cell arrays

comp = bsxfun(@eq, [C1{:,1}], [C2{:,1}].'); %'// test all pairs for equality
ind1 = any(comp,1); %// values of first col of C1 that coincide with some in C2
ind2 = any(comp,2); %// values of first col of C2 that coincide with some in C1
result = horzcat(C1(ind1,1), C1(ind1,2), C2(ind2,2)); %// build result

请注意

[C1{:,1}]用于将单元格数组C1的第一列转换为（数字）行向量。 Here这就是为什么会有效。
any的第二个参数指定了它所依据的维度。
ind1和ind2为logical indices。

此外，Matlab中的数值数组（矩阵）比单元数组更有效。如果单元格数组中的所有单元格都包含单个数字（如您的情况），请考虑使用数字数组。当每个单元格必须包含不同大小或不同类型的对象时（例如，如果一列包含数字而另一列包含字符串），单元格数组很有用。

如果使用数字数组，在这种情况下代码也会变得稍微简单：

C1 = [1 2.56; 2 7.4; 3 7.7; 100 83.4];
C2 = [1 7.1; 3 1.4; 4 4.4; 76 7.2]; %// example data. Two-column matrices

comp = bsxfun(@eq, C1(:,1).', C2(:,1)); %'// test all pairs for equality
ind1 = any(comp,1); %// values of first col of C1 that coincide with some in C2
ind2 = any(comp,2); %// values of first col of C2 that coincide with some in C1
result = [C1(ind1,1) C1(ind1,2) C2(ind2,2)]; %// build result

Answer 4

传统上，我认为@ ysakamto的答案是最好的。但是，如果您使用较新版本的Matlab（去年？），他们会添加table数据类型，该数据类型支持SQL类型的操作，例如join。 http://www.mathworks.com/help/matlab/ref/join.html

MATLAB-ish方法是保持两个单元阵列之间的公共输入行？

4 个答案: