使用regex_iterator浏览HTML文件的标记

时间:2014-11-26 17:04:10

标签: html c++ regex

我正在编写一个Web浏览器并尝试使用regex_iterator来浏览HTML文档的标记,并最终创建一个文档树。首先,我需要一个能够获得HTML标记的正则表达式。以下内容应打印出每个HTML标记

#include <string>
#include <regex>
#include <iostream>

int main()
{

    std::string s("<!DOCTYPE html><head></head><body><div class='container' id='someId'><p>Here's a p tag</p><p>Here's another p tag</p></div></body>");
    std::regex e("[someRegularExpression]");
    std::regex_iterator<std::string::iterator> htmlTagRover ( s.begin(), s.end(), e );
    std::regex_iterator<std::string::iterator> offend;
    while (htmlTagRover != offend)
        std::cout << htmlTagRover->str() << std::endl;

    return 0;
}

如果[someRegularExpression]等于HTML标记的正则表达式。我尝试运行程序时遇到以下错误:

/home/svzQOJ/ccEMKoqM.o:在函数main': prog.cpp:(.text.startup+0xd1): undefined reference to std :: regex_iterator&lt; __ gnu_cxx :: __ normal_iterator,char,std :: regex_traits&gt; :: regex_iterator(__ gnu_cxx :: __ normal_iterator,__ gn_cxx :: __ normal_iterator,std :: basic_regex&gt; const&amp;,std :: bitset&lt; 11u&gt;)' prog.cpp :(。text.startup + 0xdc):未定义引用std::regex_iterator<__gnu_cxx::__normal_iterator<char*, std::string>, char, std::regex_traits<char> >::regex_iterator()' prog.cpp:(.text.startup+0x1af): undefined reference to std :: regex_iterator&lt; __ gnu_cxx :: __ normal_iterator,char,std :: regex_traits&gt; :: operator!=(std :: regex_iterator&lt; __gnu_cxx :: __ normal_iterator,char,std :: regex_traits&gt; const&amp;)' prog.cpp :(。text.startup + 0x1be):对`std :: regex_iterator&lt; __ gnu_cxx :: __ normal_iterator,char,std :: regex_traits&gt; :: operator-&gt;()'的未定义引用 collect2:错误:ld返回1退出状态

知道为什么吗?

1 个答案:

答案 0 :(得分:0)

根据here,你不需要调用<std::string::iterator>,你需要使用std :: sregex_iterator(注意s)来使用std :: string的正则表达式