用元素混合物解析化学式

时间:2017-03-21 20:18:20

标签: c++ boost-spirit boost-spirit-qi chemistry

我想使用boost :: spirit来提取由粗糙配方中的几种元素组成的化合物的化学计量。在给定的化合物中,我的解析器应该能够区分三种化学元素模式:

  • 由天然丰度的同位素混合物制成的天然元素
  • 纯同位素
  • 非天然丰度的同位素混合物

然后使用这些模式来解析以下化合物:

  • " C" - >由天然丰富的C [12]和C [13]制成的天然碳
  • " CH 4" - >由天然碳和氢制成的甲烷
  • " C2H {H [1](0.8)H [2](0.2)} 6" - >乙烷由天然C和非天然H制成,由80%的氢和20%的氘组成
  • " U [235]" - >纯铀235

显然,化学元素模式可以是任何顺序(例如CH [1] 4和H [1] 4C ......)和频率。

我写了一个非常接近完成工作的解析器,但我仍面临一个问题。

这是我的代码:

template <typename Iterator>
struct ChemicalFormulaParser : qi::grammar<Iterator,isotopesMixture(),qi::locals<isotopesMixture,double>>
{
    ChemicalFormulaParser(): ChemicalFormulaParser::base_type(_start)
    {

        namespace phx = boost::phoenix;

        // Semantic action for handling the case of pure isotope    
        phx::function<PureIsotopeBuilder> const build_pure_isotope = PureIsotopeBuilder();
        // Semantic action for handling the case of pure isotope mixture   
        phx::function<IsotopesMixtureBuilder> const build_isotopes_mixture = IsotopesMixtureBuilder();
        // Semantic action for handling the case of natural element   
        phx::function<NaturalElementBuilder> const build_natural_element = NaturalElementBuilder();

        phx::function<UpdateElement> const update_element = UpdateElement();

        // XML database that store all the isotopes of the periodical table
        ChemicalDatabaseManager<Isotope>* imgr=ChemicalDatabaseManager<Isotope>::Instance();
        const auto& isotopeDatabase=imgr->getDatabase();
        // Loop over the database to the spirit symbols for the isotopes names (e.g. H[1],C[14]) and the elements (e.g. H,C)
        for (const auto& isotope : isotopeDatabase) {
            _isotopeNames.add(isotope.second.getName(),isotope.second.getName());
            _elementSymbols.add(isotope.second.getProperty<std::string>("symbol"),isotope.second.getProperty<std::string>("symbol"));
        }

        _mixtureToken = "{" >> +(_isotopeNames >> "(" >> qi::double_ >> ")") >> "}";
        _isotopesMixtureToken = (_elementSymbols[qi::_a=qi::_1] >> _mixtureToken[qi::_b=qi::_1])[qi::_pass=build_isotopes_mixture(qi::_val,qi::_a,qi::_b)];

        _pureIsotopeToken = (_isotopeNames[qi::_a=qi::_1])[qi::_pass=build_pure_isotope(qi::_val,qi::_a)];
        _naturalElementToken = (_elementSymbols[qi::_a=qi::_1])[qi::_pass=build_natural_element(qi::_val,qi::_a)];

        _start = +( ( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken)[qi::_a=qi::_1] >>
                      (qi::double_|qi::attr(1.0))[qi::_b=qi::_1])[qi::_pass=update_element(qi::_val,qi::_a,qi::_b)] );

    }

    //! Defines the rule for matching a prefix
    qi::symbols<char,std::string> _isotopeNames;
    qi::symbols<char,std::string> _elementSymbols;

    qi::rule<Iterator,isotopesMixture()> _mixtureToken;
    qi::rule<Iterator,isotopesMixture(),qi::locals<std::string,isotopesMixture>> _isotopesMixtureToken;

    qi::rule<Iterator,isotopesMixture(),qi::locals<std::string>> _pureIsotopeToken;
    qi::rule<Iterator,isotopesMixture(),qi::locals<std::string>> _naturalElementToken;

    qi::rule<Iterator,isotopesMixture(),qi::locals<isotopesMixture,double>> _start;
};

基本上,每个单独的元素模式都可以使用它们各自的语义动作进行正确解析,这些动作产生了构建化合物的同位素和它们相应的化学计量之间的映射。解析以下化合物时问题开始:

CH{H[1](0.9)H[2](0.4)}

在这种情况下,语义动作build_isotopes_mixture返回false,因为0.9 + 0.4对于比率之和是无意义的。因此,我本来期望并希望我的解析器失败了。但是,由于_start规则使用了三种化学元素模式的替代运算符,解析器设法解析它1)丢弃{H[1](0.9)H[2](0.4)}部分2)保持前面的{{1 3)使用H解析它。我的语法不够清晰,无法表达为解析器吗?如何使用替代运算符,当运行语义操作时发现给出_naturalElementToken时,解析器会停止?

1 个答案:

答案 0 :(得分:3)

  

如何以这样的方式使用替代运算符:当找到一个事件但在运行语义操作时给出错误时,解析器会停止?

通常,您可以通过添加expectation point来防止回溯。

在这种情况下,您实际上正在“混淆”几项任务:

  1. 匹配输入
  2. 解释匹配的输入
  3. 验证匹配的输入
  4. 精神擅长匹配输入,有  在解释时(主要是AST创建意义上的)很好的设施。但是,在运行中进行验证时,事情变得“令人讨厌”。

    我经常重复的建议是尽可能考虑分离问题。我会考虑

    1. 首先构建输入的直接AST表示,
    2. 转换/规范化/扩展/规范化为更方便或有意义的域表示
    3. 对结果进行最终验证
    4. 这为您提供了最具表现力的代码,同时保持高度可维护性。

      因为我不能很好地理解问题域并且代码样本不够完整而无法引发它,所以我不会尝试提供我想到的完整样本。相反,我会尽力绘制我在开头提到的期望点方法。

      模拟样本编译

      花了最多的时间。 (考虑为将要帮助你的人做腿部工作)

      <强> Live On Coliru

      #include <boost/fusion/adapted/std_pair.hpp>
      #include <boost/spirit/include/qi.hpp>
      #include <boost/spirit/include/phoenix.hpp>
      #include <map>
      
      namespace qi = boost::spirit::qi;
      
      struct DummyBuilder {
          using result_type = bool;
      
          template <typename... Ts>
          bool operator()(Ts&&...) const { return true; }
      };
      
      struct PureIsotopeBuilder     : DummyBuilder {  };
      struct IsotopesMixtureBuilder : DummyBuilder {  };
      struct NaturalElementBuilder  : DummyBuilder {  };
      struct UpdateElement          : DummyBuilder {  };
      
      struct Isotope {
          std::string getName() const { return _name; }
      
          Isotope(std::string const& name = "unnamed", std::string const& symbol = "?") : _name(name), _symbol(symbol) { }
      
          template <typename T> std::string getProperty(std::string const& name) const {
              if (name == "symbol")
                  return _symbol;
              throw std::domain_error("no such property (" + name + ")");
          }
      
        private:
          std::string _name, _symbol;
      };
      
      using MixComponent    = std::pair<Isotope, double>;
      using isotopesMixture = std::list<MixComponent>;
      
      template <typename Isotope>
      struct ChemicalDatabaseManager {
          static ChemicalDatabaseManager* Instance() {
              static ChemicalDatabaseManager s_instance;
              return &s_instance;
          }
      
          auto& getDatabase() { return _db; }
        private:
          std::map<int, Isotope> _db {
              { 1, { "H[1]",   "H" } },
              { 2, { "H[2]",   "H" } },
              { 3, { "Carbon", "C" } },
              { 4, { "U[235]", "U" } },
          };
      };
      
      template <typename Iterator>
      struct ChemicalFormulaParser : qi::grammar<Iterator, isotopesMixture(), qi::locals<isotopesMixture, double> >
      {
          ChemicalFormulaParser(): ChemicalFormulaParser::base_type(_start)
          {
              using namespace qi;
              namespace phx = boost::phoenix;
      
              phx::function<PureIsotopeBuilder>     build_pure_isotope;     // Semantic action for handling the case of pure isotope
              phx::function<IsotopesMixtureBuilder> build_isotopes_mixture; // Semantic action for handling the case of pure isotope mixture
              phx::function<NaturalElementBuilder>  build_natural_element;  // Semantic action for handling the case of natural element
              phx::function<UpdateElement>          update_element;
      
              // XML database that store all the isotopes of the periodical table
              ChemicalDatabaseManager<Isotope>* imgr = ChemicalDatabaseManager<Isotope>::Instance();
              const auto& isotopeDatabase=imgr->getDatabase();
      
              // Loop over the database to the spirit symbols for the isotopes names (e.g. H[1],C[14]) and the elements (e.g. H,C)
              for (const auto& isotope : isotopeDatabase) {
                  _isotopeNames.add(isotope.second.getName(),isotope.second.getName());
                  _elementSymbols.add(isotope.second.template getProperty<std::string>("symbol"),isotope.second.template getProperty<std::string>("symbol"));
              }
      
              _mixtureToken         = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}";
              _isotopesMixtureToken = (_elementSymbols[_a=_1] >> _mixtureToken[_b=_1])[_pass=build_isotopes_mixture(_val,_a,_b)];
      
              _pureIsotopeToken     = (_isotopeNames[_a=_1])[_pass=build_pure_isotope(_val,_a)];
              _naturalElementToken  = (_elementSymbols[_a=_1])[_pass=build_natural_element(_val,_a)];
      
              _start = +( ( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken)[_a=_1] >>
                          (double_|attr(1.0))[_b=_1]) [_pass=update_element(_val,_a,_b)] );
          }
      
        private:
          //! Defines the rule for matching a prefix
          qi::symbols<char, std::string> _isotopeNames;
          qi::symbols<char, std::string> _elementSymbols;
      
          qi::rule<Iterator, isotopesMixture()> _mixtureToken;
          qi::rule<Iterator, isotopesMixture(), qi::locals<std::string, isotopesMixture> > _isotopesMixtureToken;
      
          qi::rule<Iterator, isotopesMixture(), qi::locals<std::string> > _pureIsotopeToken;
          qi::rule<Iterator, isotopesMixture(), qi::locals<std::string> > _naturalElementToken;
      
          qi::rule<Iterator, isotopesMixture(), qi::locals<isotopesMixture, double> > _start;
      };
      
      int main() {
          using It = std::string::const_iterator;
          ChemicalFormulaParser<It> parser;
          for (std::string const input : {
                  "C",                        // --> natural carbon made of C[12] and C[13] in natural abundance
                  "CH4",                      // --> methane made of natural carbon and hydrogen
                  "C2H{H[1](0.8)H[2](0.2)}6", // --> ethane made of natural C and non-natural H made of 80% of hydrogen and 20% of deuterium
                  "C2H{H[1](0.9)H[2](0.2)}6", // --> invalid mixture (total is 110%?)
                  "U[235]",                   // --> pure uranium 235
              })
          {
              std::cout << " ============= '" << input << "' ===========\n";
              It f = input.begin(), l = input.end();
              isotopesMixture mixture;
              bool ok = qi::parse(f, l, parser, mixture);
      
              if (ok)
                  std::cout << "Parsed successfully\n";
              else
                  std::cout << "Parse failure\n";
      
              if (f != l)
                  std::cout << "Remaining input unparsed: '" << std::string(f, l) << "'\n";
          }
      }
      

      如给出的那样,只打印

       ============= 'C' ===========
      Parsed successfully
       ============= 'CH4' ===========
      Parsed successfully
       ============= 'C2H{H[1](0.8)H[2](0.2)}6' ===========
      Parsed successfully
       ============= 'C2H{H[1](0.9)H[2](0.2)}6' ===========
      Parsed successfully
       ============= 'U[235]' ===========
      Parsed successfully
      

      一般评论:

      1. 不需要本地人,只需使用常规占位符:

        _mixtureToken         = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}";
        _isotopesMixtureToken = (_elementSymbols >> _mixtureToken) [ _pass=build_isotopes_mixture(_val, _1, _2) ];
        
        _pureIsotopeToken     = _isotopeNames [ _pass=build_pure_isotope(_val, _1) ];
        _naturalElementToken  = _elementSymbols [ _pass=build_natural_element(_val, _1) ];
        
        _start = +( 
                ( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken) >>
                  (double_|attr(1.0)) ) [ _pass=update_element(_val, _1, _2) ] 
            );
        
        // ....
        qi::rule<Iterator, isotopesMixture()> _mixtureToken;
        qi::rule<Iterator, isotopesMixture()> _isotopesMixtureToken;
        qi::rule<Iterator, isotopesMixture()> _pureIsotopeToken;
        qi::rule<Iterator, isotopesMixture()> _naturalElementToken;
        qi::rule<Iterator, isotopesMixture()> _start;
        
      2. 您需要处理名称/符号之间的冲突(可能只是通过优先考虑其中一个)

      3. 符合编译器将需要template限定符(除非我完全错误地猜测了您的数据结构,在这种情况下我不知道ChemicalDatabaseManager的模板参数应该是什么意思)

          

        提示,MSVC不是符合标准的编译器

      4. <强> Live On Coliru

        期望点草图

        假设“权重”需要在_mixtureToken规则中加起来达到100%,我们可以使build_isotopes_micture“不是虚拟”并添加验证:

        struct IsotopesMixtureBuilder {
            bool operator()(isotopesMixture&/* output*/, std::string const&/* elementSymbol*/, isotopesMixture const& mixture) const {
                using namespace boost::adaptors;
        
                // validate weights total only
                return std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
            }
        };
        

        然而,正如你所说,它会通过回溯来阻止事情。相反,你可以/断言/任何完整的混合物加起来为100%:

        _mixtureToken         = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}" > eps(validate_weight_total(_val));
        

        类似

        struct ValidateWeightTotal {
            bool operator()(isotopesMixture const& mixture) const {
                using namespace boost::adaptors;
        
                bool ok = std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
                return ok;
                // or perhaps just :
                return ok? ok : throw InconsistentsWeights {};
            }
        
            struct InconsistentsWeights : virtual std::runtime_error {
                InconsistentsWeights() : std::runtime_error("InconsistentsWeights") {}
            };
        };
        

        <强> Live On Coliru

        #include <boost/fusion/adapted/std_pair.hpp>
        #include <boost/spirit/include/qi.hpp>
        #include <boost/spirit/include/phoenix.hpp>
        #include <boost/range/adaptors.hpp>
        #include <boost/range/numeric.hpp>
        #include <map>
        
        namespace qi = boost::spirit::qi;
        
        struct DummyBuilder {
            using result_type = bool;
        
            template <typename... Ts>
            bool operator()(Ts&&...) const { return true; }
        };
        
        struct PureIsotopeBuilder     : DummyBuilder {  };
        struct NaturalElementBuilder  : DummyBuilder {  };
        struct UpdateElement          : DummyBuilder {  };
        
        struct Isotope {
            std::string getName() const { return _name; }
        
            Isotope(std::string const& name = "unnamed", std::string const& symbol = "?") : _name(name), _symbol(symbol) { }
        
            template <typename T> std::string getProperty(std::string const& name) const {
                if (name == "symbol")
                    return _symbol;
                throw std::domain_error("no such property (" + name + ")");
            }
        
          private:
            std::string _name, _symbol;
        };
        
        using MixComponent    = std::pair<Isotope, double>;
        using isotopesMixture = std::list<MixComponent>;
        
        struct IsotopesMixtureBuilder {
            bool operator()(isotopesMixture&/* output*/, std::string const&/* elementSymbol*/, isotopesMixture const& mixture) const {
                using namespace boost::adaptors;
        
                // validate weights total only
                return std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
            }
        };
        
        struct ValidateWeightTotal {
            bool operator()(isotopesMixture const& mixture) const {
                using namespace boost::adaptors;
        
                bool ok = std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
                return ok;
                // or perhaps just :
                return ok? ok : throw InconsistentsWeights {};
            }
        
            struct InconsistentsWeights : virtual std::runtime_error {
                InconsistentsWeights() : std::runtime_error("InconsistentsWeights") {}
            };
        };
        
        template <typename Isotope>
        struct ChemicalDatabaseManager {
            static ChemicalDatabaseManager* Instance() {
                static ChemicalDatabaseManager s_instance;
                return &s_instance;
            }
        
            auto& getDatabase() { return _db; }
          private:
            std::map<int, Isotope> _db {
                { 1, { "H[1]",   "H" } },
                { 2, { "H[2]",   "H" } },
                { 3, { "Carbon", "C" } },
                { 4, { "U[235]", "U" } },
            };
        };
        
        template <typename Iterator>
        struct ChemicalFormulaParser : qi::grammar<Iterator, isotopesMixture()>
        {
            ChemicalFormulaParser(): ChemicalFormulaParser::base_type(_start)
            {
                using namespace qi;
                namespace phx = boost::phoenix;
        
                phx::function<PureIsotopeBuilder>     build_pure_isotope;     // Semantic action for handling the case of pure isotope
                phx::function<IsotopesMixtureBuilder> build_isotopes_mixture; // Semantic action for handling the case of pure isotope mixture
                phx::function<NaturalElementBuilder>  build_natural_element;  // Semantic action for handling the case of natural element
                phx::function<UpdateElement>          update_element;
                phx::function<ValidateWeightTotal>    validate_weight_total;
        
                // XML database that store all the isotopes of the periodical table
                ChemicalDatabaseManager<Isotope>* imgr = ChemicalDatabaseManager<Isotope>::Instance();
                const auto& isotopeDatabase=imgr->getDatabase();
        
                // Loop over the database to the spirit symbols for the isotopes names (e.g. H[1],C[14]) and the elements (e.g. H,C)
                for (const auto& isotope : isotopeDatabase) {
                    _isotopeNames.add(isotope.second.getName(),isotope.second.getName());
                    _elementSymbols.add(isotope.second.template getProperty<std::string>("symbol"), isotope.second.template getProperty<std::string>("symbol"));
                }
        
                _mixtureToken         = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}" > eps(validate_weight_total(_val));
                _isotopesMixtureToken = (_elementSymbols >> _mixtureToken) [ _pass=build_isotopes_mixture(_val, _1, _2) ];
        
                _pureIsotopeToken     = _isotopeNames [ _pass=build_pure_isotope(_val, _1) ];
                _naturalElementToken  = _elementSymbols [ _pass=build_natural_element(_val, _1) ];
        
                _start = +( 
                        ( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken) >>
                          (double_|attr(1.0)) ) [ _pass=update_element(_val, _1, _2) ] 
                    );
            }
        
          private:
            //! Defines the rule for matching a prefix
            qi::symbols<char, std::string> _isotopeNames;
            qi::symbols<char, std::string> _elementSymbols;
        
            qi::rule<Iterator, isotopesMixture()> _mixtureToken;
            qi::rule<Iterator, isotopesMixture()> _isotopesMixtureToken;
            qi::rule<Iterator, isotopesMixture()> _pureIsotopeToken;
            qi::rule<Iterator, isotopesMixture()> _naturalElementToken;
            qi::rule<Iterator, isotopesMixture()> _start;
        };
        
        int main() {
            using It = std::string::const_iterator;
            ChemicalFormulaParser<It> parser;
            for (std::string const input : {
                    "C",                        // --> natural carbon made of C[12] and C[13] in natural abundance
                    "CH4",                      // --> methane made of natural carbon and hydrogen
                    "C2H{H[1](0.8)H[2](0.2)}6", // --> ethane made of natural C and non-natural H made of 80% of hydrogen and 20% of deuterium
                    "C2H{H[1](0.9)H[2](0.2)}6", // --> invalid mixture (total is 110%?)
                    "U[235]",                   // --> pure uranium 235
                }) try 
            {
                std::cout << " ============= '" << input << "' ===========\n";
                It f = input.begin(), l = input.end();
                isotopesMixture mixture;
                bool ok = qi::parse(f, l, parser, mixture);
        
                if (ok)
                    std::cout << "Parsed successfully\n";
                else
                    std::cout << "Parse failure\n";
        
                if (f != l)
                    std::cout << "Remaining input unparsed: '" << std::string(f, l) << "'\n";
            } catch(std::exception const& e) {
                std::cout << "Caught exception '" << e.what() << "'\n";
            }
        }
        

        打印

         ============= 'C' ===========
        Parsed successfully
         ============= 'CH4' ===========
        Parsed successfully
         ============= 'C2H{H[1](0.8)H[2](0.2)}6' ===========
        Parsed successfully
         ============= 'C2H{H[1](0.9)H[2](0.2)}6' ===========
        Caught exception 'boost::spirit::qi::expectation_failure'
         ============= 'U[235]' ===========
        Parsed successfully