Armadillo和OpenMP以及堆栈使用后范围

时间:2017-12-05 09:19:01

标签: openmp rcpp armadillo r-package address-sanitizer

我在使用R-package中的Armadillo博客中的C ++ OpenMP库时出现了堆栈使用后范围错误的问题,我无法弄清楚出了什么问题。完整的gcc日志是来自R-package的CRAN GCC ASAN检查的here。我已经保留了下面日志的相关部分

==33791==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffd03364940 at pc 0x7ff8127abc07 bp 0x7ffd03364680 sp 0x7ffd03364670
WRITE of size 4 at 0x7ffd03364940 thread T0
    #0 0x7ff8127abc06 in arma::Mat<double>::Mat(double*, unsigned int, unsigned int, bool, bool) /data/gannet/ripley/R/test-3.5/RcppArmadillo/include/armadillo_bits/Mat_meat.hpp:1215
    #1 0x7ff8129fb0c2 in GMA<logistic>::solve() [clone ._omp_fn.0] /data/gannet/ripley/R/test-3.5/RcppArmadillo/include/armadillo_bits/Col_meat.hpp:411
    #2 0x7ff825ae2cde in GOMP_parallel (/lib64/libgomp.so.1+0xdcde)
    #3 0x7ff812a0c9f8 in GMA<logistic>::solve() ddhazard/GMA_solver.cpp:83
    #4 0x7ff81276421d in ddhazard_fit_cpp(...

Address 0x7ffd03364940 is located in stack of thread T0 at offset 416 in frame
    #0 0x7ff8129fa82f in GMA<logistic>::solve() [clone ._omp_fn.0] ddhazard/GMA_solver.cpp:83

  This frame has 5 object(s):
    [32, 40) 'dest'
    [96, 104) 'src'
    [160, 176) 'ans'
    [224, 384) 'my_X_cross'
    [416, 576) '<unknown>' <== Memory access at offset 416 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope /data/gannet/ripley/R/test-3.5/RcppArmadillo/include/armadillo_bits/Mat_meat.hpp:1215 in arma::Mat<double>::Mat(double*, unsigned int, unsigned int, bool, bool)
Shadow bytes around the buggy address:
  0x1000206648d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000206648e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000206648f0: 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2 f2 f2 f2 f2
  0x100020664900: 00 f2 f2 f2 f2 f2 f2 f2 f8 f8 f2 f2 f2 f2 f2 f2
  0x100020664910: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100020664920: 00 00 00 00 f2 f2 f2 f2[f8]f8 f8 f8 f8 f8 f8 f8
  0x100020664930: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f3 f3 f3 f3
  0x100020664940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100020664950: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100020664960: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100020664970: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==33791==ABORTING

导致错误的WRITE位于dynamichazard/src/ddhazard/GMA_solver.cpp,特别是此OpenMP

#ifdef _OPENMP
      int n_threads = std::max(1, std::min(omp_get_max_threads(),
                                           (int)r_set.n_elem / 1000 + 1));
#pragma omp parallel num_threads(n_threads) if(n_threads > 1)
{
#endif
      arma::mat my_X_cross(q, q, arma::fill::zeros);

#ifdef _OPENMP
#pragma omp for schedule(static)
#endif
      for(arma::uword i = 0; i < r_set.n_elem; i++){
        auto trunc_eta = T::truncate_eta(
          is_event[i], eta[i], exp(eta[i]), at_risk_length[i]);
        h_1d[i] =  w[i] * T::d_log_like(
          is_event[i], trunc_eta, at_risk_length[i]);
        double h_2d_neg = -  w[i] * T::dd_log_like(
          is_event[i], trunc_eta, at_risk_length[i]);
        sym_mat_rank_one_update(h_2d_neg, X_t.unsafe_col(i), my_X_cross);
      }

#ifdef _OPENMP
#pragma omp critical(gma_lock)
{
#endif
      X_cross += my_X_cross;

#ifdef _OPENMP
}
}
#endif

据我所知,错误发生在致电sym_mat_rank_one_updateX_t.unsafe_col(i)来电时。该函数的声明是

void sym_mat_rank_one_update(const double, const arma::vec&, arma::mat&);

它应该触发对include/armadillo_bits/Col_meat.hpp第411行arma::col<double>构造函数的调用,该构造函数继承include/armadillo_bits/Mat_meat.hpp第1215行中的arma::mat<double>构造函数。我收集这是因为unsigned int构造函数是

arma::mat<double>之一发生了4位写入
template<typename eT>
inline
Mat<eT>::Mat(eT* aux_mem, const uword aux_n_rows, const uword aux_n_cols, const bool copy_aux_mem, const bool strict)
  : n_rows   ( aux_n_rows                            )
  , n_cols   ( aux_n_cols                            )
  , n_elem   ( aux_n_rows*aux_n_cols                 )
  , vec_state( 0                                     )
  , mem_state( copy_aux_mem ? 0 : ( strict ? 2 : 1 ) )
  , mem      ( copy_aux_mem ? 0 : aux_mem            )
  {
  arma_extra_debug_sigprint_this(this);

  if(copy_aux_mem == true)
    {
    init_cold();

    arrayops::copy( memptr(), aux_mem, n_elem );
    }
  }

其中

template<typename eT>
class Mat : public Base< eT, Mat<eT> >
  {
  public:

  typedef eT                                elem_type;  //!< the type of elements stored in the matrix
  typedef typename get_pod_type<eT>::result  pod_type;  //!< if eT is std::complex<T>, pod_type is T; otherwise pod_type is eT

  const uword  n_rows;    //!< number of rows     (read-only)
  const uword  n_cols;    //!< number of columns  (read-only)
  const uword  n_elem;    //!< number of elements (read-only)
  const uhword vec_state; //!< 0: matrix layout; 1: column vector layout; 2: row vector layout
  const uhword mem_state; 
  ...

请参阅include/armadillo_bits/Mat_bones.hpp,注意arma::uwordunsigned int但是,我无法弄清楚为什么会导致堆栈使用后范围

Morpho包中存在类似的错误。请参阅the current CRAN log heresrc/createL.cpp

设置

以上检查是在CRAN上进行的。 As far as I can tell,在Fedora 26上使用gcc 7.2,以下config.site用于构建R

CXX="g++ -fsanitize=address,undefined,bounds-strict -fno-omit-frame-pointer"
CFLAGS="-g -O2 -Wall -pedantic -mtune=native -fsanitize=address"
FFLAGS="-g -O2 -mtune=native"
FCFLAGS="-g -O2 -mtune=native"
CXXFLAGS="-g -O2 -Wall -pedantic -mtune=native"
MAIN_LDFLAGS=-fsanitize=address,undefined

此外,使用以下~/.R/Makevars

CC = gcc -std=gnu99 -fsanitize=address,undefined -fno-omit-frame-pointer
F77 = gfortran -fsanitize=address
FC = gfortran -fsanitize=address
FCFLAGS = -g -O2 -mtune=native -fbounds-check
FFLAGS = -g -O2 -mtune=native -fbounds-check

clang 5.0.0和valgrind在同一台计算机上不会发生此错误。此外,我无法在具有gcc版本6.3和clang版本4.0.0的本地Ubuntu 17.04上重现它们。

最小,完整且可验证的示例

我会努力制作一个。

0 个答案:

没有答案