我在使用R-package中的Armadillo
博客中的C ++ OpenMP
库时出现了堆栈使用后范围错误的问题,我无法弄清楚出了什么问题。完整的gcc
日志是来自R-package的CRAN GCC ASAN检查的here。我已经保留了下面日志的相关部分
==33791==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffd03364940 at pc 0x7ff8127abc07 bp 0x7ffd03364680 sp 0x7ffd03364670
WRITE of size 4 at 0x7ffd03364940 thread T0
#0 0x7ff8127abc06 in arma::Mat<double>::Mat(double*, unsigned int, unsigned int, bool, bool) /data/gannet/ripley/R/test-3.5/RcppArmadillo/include/armadillo_bits/Mat_meat.hpp:1215
#1 0x7ff8129fb0c2 in GMA<logistic>::solve() [clone ._omp_fn.0] /data/gannet/ripley/R/test-3.5/RcppArmadillo/include/armadillo_bits/Col_meat.hpp:411
#2 0x7ff825ae2cde in GOMP_parallel (/lib64/libgomp.so.1+0xdcde)
#3 0x7ff812a0c9f8 in GMA<logistic>::solve() ddhazard/GMA_solver.cpp:83
#4 0x7ff81276421d in ddhazard_fit_cpp(...
Address 0x7ffd03364940 is located in stack of thread T0 at offset 416 in frame
#0 0x7ff8129fa82f in GMA<logistic>::solve() [clone ._omp_fn.0] ddhazard/GMA_solver.cpp:83
This frame has 5 object(s):
[32, 40) 'dest'
[96, 104) 'src'
[160, 176) 'ans'
[224, 384) 'my_X_cross'
[416, 576) '<unknown>' <== Memory access at offset 416 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope /data/gannet/ripley/R/test-3.5/RcppArmadillo/include/armadillo_bits/Mat_meat.hpp:1215 in arma::Mat<double>::Mat(double*, unsigned int, unsigned int, bool, bool)
Shadow bytes around the buggy address:
0x1000206648d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000206648e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000206648f0: 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2 f2 f2 f2 f2
0x100020664900: 00 f2 f2 f2 f2 f2 f2 f2 f8 f8 f2 f2 f2 f2 f2 f2
0x100020664910: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100020664920: 00 00 00 00 f2 f2 f2 f2[f8]f8 f8 f8 f8 f8 f8 f8
0x100020664930: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f3 f3 f3 f3
0x100020664940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020664950: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020664960: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020664970: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==33791==ABORTING
导致错误的WRITE
位于dynamichazard/src/ddhazard/GMA_solver.cpp,特别是此OpenMP
块
#ifdef _OPENMP
int n_threads = std::max(1, std::min(omp_get_max_threads(),
(int)r_set.n_elem / 1000 + 1));
#pragma omp parallel num_threads(n_threads) if(n_threads > 1)
{
#endif
arma::mat my_X_cross(q, q, arma::fill::zeros);
#ifdef _OPENMP
#pragma omp for schedule(static)
#endif
for(arma::uword i = 0; i < r_set.n_elem; i++){
auto trunc_eta = T::truncate_eta(
is_event[i], eta[i], exp(eta[i]), at_risk_length[i]);
h_1d[i] = w[i] * T::d_log_like(
is_event[i], trunc_eta, at_risk_length[i]);
double h_2d_neg = - w[i] * T::dd_log_like(
is_event[i], trunc_eta, at_risk_length[i]);
sym_mat_rank_one_update(h_2d_neg, X_t.unsafe_col(i), my_X_cross);
}
#ifdef _OPENMP
#pragma omp critical(gma_lock)
{
#endif
X_cross += my_X_cross;
#ifdef _OPENMP
}
}
#endif
据我所知,错误发生在致电sym_mat_rank_one_update
的X_t.unsafe_col(i)
来电时。该函数的声明是
void sym_mat_rank_one_update(const double, const arma::vec&, arma::mat&);
它应该触发对include/armadillo_bits/Col_meat.hpp第411行arma::col<double>
构造函数的调用,该构造函数继承include/armadillo_bits/Mat_meat.hpp第1215行中的arma::mat<double>
构造函数。我收集这是因为unsigned int
构造函数是
arma::mat<double>
之一发生了4位写入
template<typename eT>
inline
Mat<eT>::Mat(eT* aux_mem, const uword aux_n_rows, const uword aux_n_cols, const bool copy_aux_mem, const bool strict)
: n_rows ( aux_n_rows )
, n_cols ( aux_n_cols )
, n_elem ( aux_n_rows*aux_n_cols )
, vec_state( 0 )
, mem_state( copy_aux_mem ? 0 : ( strict ? 2 : 1 ) )
, mem ( copy_aux_mem ? 0 : aux_mem )
{
arma_extra_debug_sigprint_this(this);
if(copy_aux_mem == true)
{
init_cold();
arrayops::copy( memptr(), aux_mem, n_elem );
}
}
其中
template<typename eT>
class Mat : public Base< eT, Mat<eT> >
{
public:
typedef eT elem_type; //!< the type of elements stored in the matrix
typedef typename get_pod_type<eT>::result pod_type; //!< if eT is std::complex<T>, pod_type is T; otherwise pod_type is eT
const uword n_rows; //!< number of rows (read-only)
const uword n_cols; //!< number of columns (read-only)
const uword n_elem; //!< number of elements (read-only)
const uhword vec_state; //!< 0: matrix layout; 1: column vector layout; 2: row vector layout
const uhword mem_state;
...
请参阅include/armadillo_bits/Mat_bones.hpp,注意arma::uword
是unsigned int
。 但是,我无法弄清楚为什么会导致堆栈使用后范围。
Morpho
包中存在类似的错误。请参阅the current CRAN log here和src/createL.cpp。
以上检查是在CRAN上进行的。 As far as I can tell,在Fedora 26上使用gcc
7.2,以下config.site
用于构建R
CXX="g++ -fsanitize=address,undefined,bounds-strict -fno-omit-frame-pointer"
CFLAGS="-g -O2 -Wall -pedantic -mtune=native -fsanitize=address"
FFLAGS="-g -O2 -mtune=native"
FCFLAGS="-g -O2 -mtune=native"
CXXFLAGS="-g -O2 -Wall -pedantic -mtune=native"
MAIN_LDFLAGS=-fsanitize=address,undefined
此外,使用以下~/.R/Makevars
CC = gcc -std=gnu99 -fsanitize=address,undefined -fno-omit-frame-pointer
F77 = gfortran -fsanitize=address
FC = gfortran -fsanitize=address
FCFLAGS = -g -O2 -mtune=native -fbounds-check
FFLAGS = -g -O2 -mtune=native -fbounds-check
clang
5.0.0和valgrind
在同一台计算机上不会发生此错误。此外,我无法在具有gcc
版本6.3和clang
版本4.0.0的本地Ubuntu 17.04上重现它们。
我会努力制作一个。