Question

我正在尝试合并在多列中共享相同值的两个矩阵。

以下矩阵应该例证我的问题并提供MWE。但是，我的数据很长size(500000, 4)，因此我正在寻找合并它们的有效方法。数据包含调用c和p投放数据的选项数据，列1:4：日期，付款，到期，出价。最后，我希望有一个列1:5的矩阵：日期，罢工，到期，看涨期权价格，看跌期权价格。如MWE中所示，数据长度不同，但列1:3（日期，警示，到期）的每个组合仅存在一次。

c = [7356011 300 7356081 1.15; 7356011 400 7356081 1.56; 7356011 500 7356081 1.79; 7356011 300 7356088 1.25; 7356011 400 7356088 1.67; 7356011 500 7356088 1.89; 7356011 600 7356088 1.92; 7356012 300 7356081 0.79; 7356012 400 7356081 0.99; 7356012 500 7356081 1.08; 7356012 300 7356088 0.81; 7356012 400 7356088 0.90; 7356012 500 7356088 1.07]

p = [7356011 300 7356081 1.35; 7356011 400 7356081 1.15; 7356011 500 7356081 1.03; 7356011 300 7356088 1.56; 7356011 400 7356088 1.15; 7356011 500 7356088 1.03; 7356012 300 7356081 1.25; 7356012 400 7356081 1.19; 7356012 500 7356081 1.02; 7356012 300 7356088 1.14; 7356012 400 7356088 0.98; 7356012 500 7356088 0.76; 7356012 600 7356088 0.56; 7356012 700 7356088 0.44]

我尝试使用strcat和num2str为每列构建ID，并获取＆＃39; ID（1）= 73560113007356081＆＃39;但是这需要很长时间才能获得大量数据。我还尝试使用unique和ismember找到解决方案，但是遇到了多列问题。

希望的输出是：

7356011 300 7356081 1.15 1.35 7356011 400 7356081 1.56 1.15 7356011 500 7356081 1.79 1.03 7356011 300 7356088 1.25 1.56 7356011 400 7356088 1.67 1.15 7356011 500 7356088 1.89 1.03 7356011 600 7356088 1.92 NaN 7356012 300 7356081 0.79 1.25 7356012 400 7356081 0.99 1.19 7356012 500 7356081 1.08 1.02 7356012 300 7356088 0.81 1.14 7356012 400 7356088 0.90 0.98 7356012 500 7356088 1.07 0.76 7356012 600 7356088 NaN 0.56 7356012 700 7356088 NaN 0.44

感谢您的帮助

Answer 1

您不需要使用循环，而是使用intersect。

[~,ic,ip] = intersect(c(:, 1:3),p(:, 1:3),'rows');
m = [c(ic, :), p(ip,end)];

编辑：如果您想将NaN包含在与上述海报不相交的位置。

function m = merge(c, p, nc, np)
    %check for input arg errors
    if nargin == 3
        np = nc;
    elseif nargin ~= 4
        disp('Please enter either 3 or 4 arguments')
        m = {};
        return
    end

    %make sure they are shaped the same
    nc = reshape(nc, 1, []);
    np = reshape(np, 1, []);

    %And have the same number of elements
    if numel(nc) ~= numel(np)
        disp('Please ensure arguments 3 and 4 have the same number of elements')
        m = {};
        return
    end

    %The columns that aren't being compared
    NotNC = find(~ismember(1:size(c,2), nc));
    NotNP = find(~ismember(1:size(p,2), np));

    %Find the matching rows
    [matches,ic,ip] = intersect(c(:, nc),p(:, np),'rows');

    %Put together matching rows with the other data not included in the match
    m1 = [matches, c(ic, NotNC), p(ip, NotNP)];

    %Find rows that did not matched
    NotIC = find(~ismember(1:size(c,1), ic));
    NotIP = find(~ismember(1:size(p,1), ip));

    %Put together data not in the matched set
    m2 = [c(NotIC, nc), c(NotIC, NotNC), nan(length(NotIC), size(NotNP,2))];
    m3 = [p(NotIP, np), nan(length(NotIP), size(NotNC,2)), p(NotIP, NotNP)];

    %merge all three lists
    m = [m1; m2; m3];

end

Answer 2

好的，我不明白p是否总是更大，所以我会用if写两个解决方案。

if length(c) > length(p)
    xx = length(c);
    newm = [c NaN(xx, 1)];
    row = ismember(c, p, 'rows');

    newm(row, end) = p(row, end);
else
    xx = length(p);
    newm = [p(:,1:3) NaN(xx, 1) p(:, end)];

    row = ismember(p(:,1:3), c(:,1:3), 'rows');

    newm(row, 4) = c(row, end);
end

更新：

此代码适用于您当前的示例。

[row_p, row_c] = ismember(p(:,1:3), c(:,1:3), 'rows');

newm = [];

for ii = 1:length(row_p)
    if row_p(ii) == 1
        newm = [newm; p(ii, 1:3) c(row_c(ii), end) p(ii, end)];
    else
        newm = [newm; p(ii, 1:3) NaN p(ii, end)];
    end
end

[row_c, row_p] = ismember(c(:,1:3), p(:,1:3), 'rows');

for ii = 1:length(row_c)
    if row_c(ii) == 1
        newm = [newm; c(ii, 1:3) c(ii, end) p(row_p(ii), end)];
    else
        newm = [newm; c(ii, 1:3) c(ii, end) NaN];
    end
end

newm = unique(newm, 'rows');

在MATLAB中的不同向量中连接具有相同值的矩阵

2 个答案: