我有两个示例数据集,下面的A和B,我想加入Matlab来创建C.关键是'产品'和'年',但问题是数据集B中的产品编号只匹配A中的前4位数字。有没有办法以这种方式加入“几乎”匹配的数字?
A product tariff year 202341 2 1999 202341 4 2000 202341 20 2008 202355 9 1999 202355 16 2000 438811 0 1999 438891 8 1999 438891 3 2001 671212 15 2005 671260 10 2005
和
B product avg_tariff year 2023 5,5 1999 2023 10 2000 2023 20 2008 4388 4 1999 4388 3 2001 6712 12,5 2005
连接产生矩阵C
C product tariff year avg_tariff 202341 2 1999 5,5 202341 4 2000 10 202341 20 2008 20 202355 9 1999 5,5 202355 16 2000 10 438811 0 1999 4 438891 8 1999 4 438891 3 2001 3 671212 15 2005 12,5 671260 10 2005 12,5
提前致谢
的奥斯卡
答案 0 :(得分:1)
由于此问题与您回复的previous个问题有关,我将重复使用该代码并将其更新为新数据:
product tariff year
202341 2 1999
202341 4 2000
202341 20 2008
202355 9 1999
202355 16 2000
438811 0 1999
438891 8 1999
438891 3 2001
671212 15 2005
671260 10 2005
product avg_tariff year
2023 5.5 1999
2023 10 2000
2023 20 2008
4388 4 1999
4388 3 2001
6712 12.5 2005
(使用统计工具箱中的数据集类):
%# read A, and build dataset
fid = fopen('a.csv','rt');
C = textscan(fid, '%s%f%f', 'Delimiter',' ', 'MultipleDelimsAsOne',true, 'HeaderLines',1);
fclose(fid);
dA = dataset({C{1} 'product'}, {C{2} 'tariff'}, {C{3} 'year'});
%# read B, and build dataset
fid = fopen('b.csv','rt');
C = textscan(fid, '%s%f%f', 'Delimiter',' ', 'MultipleDelimsAsOne',true, 'HeaderLines',1);
fclose(fid);
dB = dataset({C{1} 'product'}, {C{2} 'avg_tariff'}, {C{3} 'year'});
%# truncate productA
dA.productLong = dA.product;
dA.product = cellfun(@(s)s(:,1:end-2), cellstr(dA.product), 'UniformOutput',false);
%# inner join (keep only rows that exist in both datasets)
ds = join(dA, dB, 'keys',{'product' 'year'}, 'type','inner', 'MergeKeys',true);
%# restore the long product number as first column, and sort by it
ds.product = ds.productLong;
ds.productLong = [];
ds = sortrows(ds, 'product')
预期的结果:
ds =
product tariff year avg_tariff
'202341' 2 1999 5.5
'202341' 4 2000 10
'202341' 20 2008 20
'202355' 9 1999 5.5
'202355' 16 2000 10
'438811' 0 1999 4
'438891' 8 1999 4
'438891' 3 2001 3
'671212' 15 2005 12.5
'671260' 10 2005 12.5
答案 1 :(得分:0)
加载产品数组并使用textscan:
将其视为字符串fidA = fopen('A.txt');
fidB = fopen('B.txt');
A = textscan(fidA,'%s%s%s','delimiter',' ');
B = textscan(fidB,'%s%s%s','delimiter',' ');
fclose(fidA);
fclose(fidB);
只保留A中产品的前4个字符
for i = 1:length(A{1})
rowKeyA{i} = [A{1}{i}(1:4),A{3}{i}]; %product(1:4),year
end
for i = 1:length(B{1})
rowKeyB{i} = [B{1}{i},B{3}{i}]; %product,year
end
现在只需找到rowKeyA和rowKeyB之间的匹配
for i = 1:length(rowKeyA)
j = find(strcmp(rowKeyB,rowKeyA{i}),1);
if(j)
fprintf('%s %s %s\n',rowKeyA{i},A{2},B{2});
end
end