如何在调用mapreduce函数时将参数传递给map函数?

时间:2016-04-30 11:10:08

标签: matlab hadoop mapreduce

我有一个mapreduce函数,其输出应该被送到另一个mapreduce函数 代码如下

function clustering = parallel_clustering_kmeans(data)
%% find first clustering from all chunks
result = mapreduce(data,@k_means_Mapper,@k_means_Reducer);
result = readall(result);
index = result{:,1};
index = cell2mat(cellfun(@str2num,strrep(index,',',' '),'un',0));
clustering = mapreduce(data,@k_means_Mapper_second,@k_means_Reducer);
end

第一个功能很好用     result = mapreduce(data,@ k_means_Mapper,@ k_means_Reducer); 但是,我想将索引作为k_means_Mapper_second

的参数传递

代码必须如下

function k_means_Mapper_second(data,index,intermidiateValuesOut)
distance = zeros(size(data,1),size(index,1));
parfor i = 1:size(data,1)
    for j = 1:size(index,1)
        distance(i,j) =  sum((data(i,:)-index(j,:)).^2).^0.5;
    end
end
for i = 1:size(distance,1)
    x = distance(i,:);
    [~,ind] = min(x);
    key = combine_values(data(i,:));
    addmulti(intermidiateValuesOut,key,ind);
end
end

我的问题是如何将索引作为参数传递给最后一行中的k_means_Mapper_second函数

clustering = mapreduce(data,@k_means_Mapper_second,@k_means_Reducer);

提前致谢

1 个答案:

答案 0 :(得分:2)

您需要将函数包装在一个匿名函数中,该函数接受mapreduce给出的所有输入,但随后调用您的函数,仅传递相关值。对于你的情况,它看起来像:

mapperFunc = @(data, info, interim)k_means_Mapper_second(data, index, interim);
clustering = mapreduce(data, mapperFunc, @k_means_Reducer);