C ++ :: Boost :: Regex迭代子匹配

时间:2010-04-27 03:55:19

标签: c++ regex boost

我正在使用带有Boost Regex / Xpressive的命名捕获组。

我想迭代所有子匹配,并获得每个子匹配的值和KEY(即什么[“type”])。

sregex pattern = sregex::compile(  "(?P<type>href|src)=\"(?P<url>[^\"]+)\""    );

sregex_iterator cur( web_buffer.begin(), web_buffer.end(), pattern );
sregex_iterator end;

for( ; cur != end; ++cur ){
    smatch const &what = *cur;

    //I know how to access using a string key: what["type"]
    std::cout << what[0] << " [" << what["type"] << "] [" << what["url"] <<"]"<< std::endl;

    /*I know how to iterate, using an integer key, but I would
      like to also get the original KEY into a variable, i.e.
      in case of what[1], get both the value AND "type"
    */
    for(i=0; i<what.size(); i++){
        std::cout << "{} = [" << what[i] << "]" << std::endl;
    }

    std::cout << std::endl;
}

2 个答案:

答案 0 :(得分:3)

使用Boost 1.54.0时,这更加困难,因为捕获名称甚至没有存储在结果中。相反,Boost只是散列捕获名称并存储散列(int)以及与原始字符串相关联的指针。

我编写了一个派生自boost::smatch的小类,它保存捕获名称并为它们提供迭代器。

class namesaving_smatch : public smatch
{
public:
    namesaving_smatch(const regex& pattern)
    {
        std::string pattern_str = pattern.str();
        regex capture_pattern("\\?P?<(\\w+)>");
        auto words_begin = sregex_iterator(pattern_str.begin(), pattern_str.end(), capture_pattern);
        auto words_end = sregex_iterator();

        for (sregex_iterator i = words_begin; i != words_end; i++)
        {
            std::string name = (*i)[1].str();
            m_names.push_back(name);
        }
    }

    ~namesaving_smatch() { }

    std::vector<std::string>::const_iterator names_begin() const
    {
        return m_names.begin();
    }

    std::vector<std::string>::const_iterator names_end() const
    {
        return m_names.end();
    }

private:
    std::vector<std::string> m_names;
};

该类接受在其构造函数中包含命名捕获组的正则表达式。像这样使用类:

namesaving_smatch results(re);
if (regex_search(input, results, re))
    for (auto it = results.names_begin(); it != results.names_end(); ++it)
        cout << *it << ": " << results[*it].str();

答案 1 :(得分:2)

看了一个多小时之后,我觉得相当安全,“它不能做上尉”。即使在boost代码中,它们在执行查找时也会遍历私有的named_marks_向量。它只是没有设置允许。我认为最好的选择是迭代你认为应该存在的那些并捕获那些未找到的例外。

const_reference at_(char_type const *name) const
{
    for(std::size_t i = 0; i < this->named_marks_.size(); ++i)
    {
        if(this->named_marks_[i].name_ == name)
        {
            return this->sub_matches_[ this->named_marks_[i].mark_nbr_ ];
        }
    }
    BOOST_THROW_EXCEPTION(
        regex_error(regex_constants::error_badmark, "invalid named back-reference")
    );
    // Should never execute, but if it does, this returns
    // a "null" sub_match.
    return this->sub_matches_[this->sub_matches_.size()];
}