我有一个引发segmentation fault
错误的正则表达式。
经过一些测试,我发现如果字符串大于15 KB,则正则表达式的[\s\S]*\s+
部分会出现问题,因此有时它可以工作,但有时会崩溃。
这是用g ++(gcc v.6.0.3)编译的C ++代码
#include <regex>
#include <fstream>
#include <string>
#include <iostream>
int main (int argc, char *argv[]) {
std::regex regex(
R"([\s\S]*\s+)",
std::regex_constants::icase
);
std::ifstream ifs("/home/input.txt");
const std::string input(
(std::istreambuf_iterator<char>(ifs)),
(std::istreambuf_iterator<char>())
);
std::cout << "input size: " << input.size() << std::endl;
bool reg_match = std::regex_match(input, regex);
std::cout << "matched: " << reg_match << std::endl;
}
这是怎么回事,为什么会在这种模式下发生,为什么会受到输入大小的影响?
更新:
使用-fsanitize = address编译时,运行二进制文件会产生错误:
g ++ -std = c ++ 11 /home/app/src/test.cpp -o / home / app / bin / test -fsanitize = address
ASAN:DEADLYSIGNAL
=================================================================
==37041==ERROR: AddressSanitizer: stack-overflow on address 0x7ffbff8edff8 (pc 0x55afae25781b bp 0x7ffbff8ee010 sp 0x7ffbff8edff0 T0)
#0 0x55afae25781a in bool __gnu_cxx::operator==<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, __gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&) (/home/app/bin/test+0x1981a)
#1 0x55afae2587bd in std::__detail::_Executor<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::regex_traits<char>, true>::_M_dfs(std::__detail::_Executor<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::regex_traits<char>, true>::_Match_mode, long) (/home/app/bin/test+0x1a7bd)
#2 0x55afae25e2d2 in std::__detail::_Executor<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::regex_traits<char>, true>::_M_rep_once_more(std::__detail::_Executor<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::regex_traits<char>, true>::_Match_mode, long) (/home/app/bin/test+0x202d2)
.
.
.
#251 0x55afae25e2d2 in std::__detail::_Executor<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::regex_traits<char>, true>::_M_rep_once_more(std::__detail::_Executor<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::regex_traits<char>, true>::_Match_mode, long) (/home/app/bin/test+0x202d2)
SUMMARY: AddressSanitizer: stack-overflow (/home/app/bin/test) in std::__detail::_Executor<__gnu_cxx::__normal_iterator<char const*, std:__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*,std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::regex_traits<char>, true>::_M_dfs(std::__detail::_Executo<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::submatch<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::regex_trats<char>, true>::_Match_mode, long)
==37017==ABORTING
答案 0 :(得分:0)
我没有完整的答案,但是由于某种原因,匹配您的正则表达式时发生stack overflow。这通常是由于堆栈上的数据过多或递归级别过多所致。查看您的程序,我在堆栈上看不到任何大对象(堆栈上的字符串对象很小,因为它的数据在堆上)。但是,用于正则表达式解析的状态机以进行许多递归函数调用而闻名(这与您的长地址清除器输出一起使用)。您有几种选择(我会按此顺序尝试):
std::regex_constants::optimize
,以鼓励正则表达式代码在构造用于正则表达式处理的状态机时花更多的时间进行优化。[\s\S]*
部分看起来有点不合常规。ulimit -s <stack size in kB>
来实现。-fsplit-stacks
)编译程序。请注意,这是以牺牲一些性能为代价的。