Spirit Grammar按字符数分解字符串

时间:2016-06-19 20:00:18

标签: c++ parsing boost boost-spirit

我正在学习写精神语法,我正在尝试创建一个基本的16到64的转换器,它接受一个代表十六进制的字符串,例如:

49276d206b696c

解析出6个或更少的字符(如果字符串不是6的完美倍数则更少)并从输入生成base 64编码的字符串。我认为可能有用的一个语法是这样的:

// 6 characters 
`(qi::char_("0-9a-fA-F") >> qi::char_("0-9a-fA-F") >> 
 qi::char_("0-9a-fA-F") >> qi::char_("0-9a-fA-F") >>
 qi::char_("0-9a-fA-F") >> qi::char_("0-9a-fA-F")[/*action*/]) | 


// or 5 characters
(qi::char_("0-9a-fA-F") >> qi::char_("0-9a-fA-F") >> 
qi::char_("0-9a-fA-F") >> qi::char_("0-9a-fA-F") >> 
qi::char_("0-9a-fA-F")[/*action*/]) | ...`

等....一直到一个字符,或者为每个字符数定义了不同的规则,但我认为必须有更好的方法来指定语法。我读到了关于精神重复,并且想到也许我可以做类似的事情 +(boost::spirit::repeat(1, 6)[qi::char_("0-9a-fA-F")][/*action on characters*/]) 但是由于语法的语义操作部分,编译器会对此抛出错误。有没有更简单的方法来指定语法,一次只能处理6个或更少的字符?

修改 这是我到目前为止所做的...... base16convertergrammar.hpp

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>    

#include <string>
#include <iostream>

namespace grammar {
    namespace qi = boost::spirit::qi;

    void toBase64(const std::string& p_input, std::string& p_output)
    {   
        if (p_input.length() < 6)
        {   
            // pad length
        }   

        // use back inserter and generator to append to end of p_output.
    }   

    template <typename Iterator>
    struct Base16Grammar : qi::grammar<Iterator, std::string()>
    {
        Base16Grammar() : Base16Grammar::base_type(start, "base16grammar"),
            m_base64String()
        {
            // get six characters at a time and send them off to be encoded
            // if there is less than six characters just parse what we have
            start = +(boost::spirit::repeat(1, 6)[qi::char_("0-9a-fA-F")][boost::phoenix::bind(toBase64, qi::_1,
                boost::phoenix::ref(m_base64String))]);
        }

        qi::rule<Iterator, std::string()> start;

        std::string m_base64String;
    };
}

这是用法...... base16converter.cpp

#include "base16convertergrammar.hpp"

const std::string& convertHexToBase64(const std::string& p_hexString)
{
    grammar::Base16Grammar<std::string::const_iterator> g;
    bool r = boost::spirit::qi::parse(p_hexString.begin(), p_hexString.end(), g); 
}


int main(int argc, char** argv)
{
    std::string test("49276d206b696c6c");
    convertHexToBase64(test);
}

2 个答案:

答案 0 :(得分:2)

首先,orig_array = [[0,1],[4],[3],[],[3,2,6],[]] my_array = [2,0,1,3,3,4,5] print reflect_array(orig_array, my_array) # [[2, 0], [1], [3], [], [3, 4, 5], []] 公开了一个向量,所以repeat()[],而不是字符串。

vector<char>

其次,请不要做所有的工作。您没有告诉我们输入的含义,但只要您想将它分组为六,我假设您希望它们被解释为/ something /。你可以,例如使用int_parser:

<强> Live On Coliru

void toBase64(const std::vector<char>& p_input, std::string& p_output)

打印

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>    

#include <string>
#include <iostream>

namespace grammar {
    namespace qi = boost::spirit::qi;
    namespace px = boost::phoenix;

    template <typename Iterator>
    struct Base16Grammar : qi::grammar<Iterator, std::string()>
    {
        Base16Grammar() : Base16Grammar::base_type(start, "base16grammar")
        {
            start = +qi::int_parser<uint64_t, 16, 1, 6>() [ qi::_val += to_string(qi::_1) + "; " ];
        }

      private:
        struct to_string_f { template <typename T> std::string operator()(T const& v) const { return std::to_string(v); } };
        px::function<to_string_f> to_string;

        qi::rule<Iterator, std::string()> start;
    };
}

std::string convertHexToBase64(const std::string& p_hexString)
{
    grammar::Base16Grammar<std::string::const_iterator> g;
    std::string result;
    bool r = boost::spirit::qi::parse(p_hexString.begin(), p_hexString.end(), g, result); 
    assert(r);
    return result;
}

int main()
{
    for (std::string test : {"49276d206b696c6c"})
        std::cout << test << " -> " << convertHexToBase64(test) << "\n";
}

答案 1 :(得分:1)

走出困境,你只想将十六进制编码的二进制代码转码为base64。

由于您已经在使用Boost:

<强> Live On Coliru

#include <boost/archive/iterators/base64_from_binary.hpp>
#include <boost/archive/iterators/insert_linebreaks.hpp>
#include <boost/archive/iterators/transform_width.hpp>

// for hex decoding
#include <boost/iterator/function_input_iterator.hpp>

#include <string>
#include <iostream>
#include <functional>

std::string convertHexToBase64(const std::string &hex) {
    struct get_byte_f {
        using result_type = uint8_t;

        std::string::const_iterator hex_it;

        result_type operator()() {
            auto nibble = [](uint8_t ch) {
                if (!std::isxdigit(ch)) throw std::runtime_error("invalid hex input");
                return std::isdigit(ch) ? ch - '0' : std::tolower(ch) - 'a' + 10;
            };

            auto hi = nibble(*hex_it++);
            auto lo = nibble(*hex_it++);
            return hi << 4 | lo;
        }
    } get_byte{ hex.begin() };

    using namespace boost::archive::iterators;

    using It = boost::iterators::function_input_iterator<get_byte_f, size_t>;

    typedef insert_linebreaks<    // insert line breaks every 72 characters
        base64_from_binary<       // convert binary values to base64 characters
            transform_width<      // retrieve 6 bit integers from a sequence of 8 bit bytes
            It, 6, 8> >,
        72> B64;                  // compose all the above operations in to a new iterator

    return { B64(It{get_byte, 0}), B64(It{get_byte, hex.size()/2}) };
}

int main() {
    for (std::string test : {
            "49276d206b696c6c",
            "736f6d65206c656e67746879207465787420746f2073686f77207768617420776f756c642068617070656e206174206c696e6520777261700a"
        })
    {
        std::cout << " === hex: " << test << "\n" << convertHexToBase64(test) << "\n";
    }
}

打印

 === hex: 49276d206b696c6c
SSdtIGtpbGw
 === hex: 736f6d65206c656e67746879207465787420746f2073686f77207768617420776f756c642068617070656e206174206c696e6520777261700a
c29tZSBsZW5ndGh5IHRleHQgdG8gc2hvdyB3aGF0IHdvdWxkIGhhcHBlbiBhdCBsaW5lIHdy
YXAK