该函数用于分割不同(可能是平衡的)块中的文本(我已经详细讨论了它here)。这是代码(在下面查看问题描述):
pair<off_t,off_t>* getSplits() {
struct stat st;
off_t size;
if (stat(file_name.c_str(), &st) == 0)
size = st.st_size;
int nMappers = size > nWorkers ? nWorkers : size; //if workers are greater than file size
pair<off_t,off_t> splits [nMappers];
double split_size = (double) size / nMappers;
off_t acc = 0 ;
ff::ParallelFor pf( ff_realNumCores() );
string prova = file_name;
pf.parallel_for(0,nMappers,[&splits,split_size, prova, size](const long i) {
ifstream ifs (prova , ifstream::in);
off_t begin = ceil((double) i*split_size);
off_t end = ceil((double) (i+1)*split_size-1);
char c;
string s;
if(begin>0){
//if char before the first one is different from ' ' or '\n'
//then the split begins in the middle of a word (bad)
ifs.seekg(begin-1,ios::beg);
ifs.get(c);
if(c!=' ' && c!='\n'){
getline(ifs,s,' ');
begin+=s.length();
}
if(begin>end)
end=begin;
}
ifs.seekg(end,ios::beg);
ifs.get(c);
if(c!=' ' && c!='\n' && end != size){
getline(ifs,s,' ');
end+=s.length();
}
splits[i] = {begin, end};
});
pair<off_t,off_t> *p = splits;
for(int i=0;i<nWorkers;i++){
cout<<"begin="<<p[i].first<<" end="<<splits[i].second<<endl;
}
return p;
}
这就是我称之为打印内容的方式:
pair<off_t,off_t> *splits = input_format->getSplits();
for(int i=0; i<nWorkers; i++){
cout<<"outside split begin="<<splits[i].first<<" second="<<splits[i].second<<endl;
this->ff_send_out(new MapTask<MIK,MIV,MOK,MOV> (record_reader->clone(),splits[i],map_func));
}
问题在于,如果我在p
(最后getSplits()
个周期)内打印for
的内容,那么结果是正确的:
begin=0 end=13
begin=14 end=14
begin=15 end=21
begin=22 end=28
outside split begin=0 second=13
outside split begin=14 second=14
outside split begin=15 second=21
outside split begin=22 second=28
但是如果我不这样做(所以我删除了打印for
),那么结果是错误的(只有第一对是正确的)!
outside split begin=0 second=13
outside split begin=140152066182136 second=140152054622976
outside split begin=140152066227112 second=29521758
outside split begin=140152054622960 second=2564825869
这怎么可能?
答案 0 :(得分:0)
您返回局部变量的地址,因此您有悬空指针。
我建议改为std::vector<std::pair<off_t, off_t>
。
答案 1 :(得分:0)
您正在获取指向堆栈的指针(这是您的函数定义拆分的位置)。
使用全局/静态变量,或者更好的关闭,传递指针数组并将其填充到函数内部。