如何解析无效的XML

时间:2017-12-23 20:41:56

标签: ruby xml mechanize mechanize-ruby

我正在处理一个项目,我从服务器请求XML文档并解析它以将数据导入我的系统。我正在使用Ruby 2.4.3。

我的问题是XML带有名称以数字开头的元素标签。 Nokogiri将这些标签视为上一个标签内容的一部分,并且会混淆解析文档的其余部分。

我正在使用Mechanize来请求数据。任何想法的想法?

我唯一能想到的是在Mechanize中编写一个完全自定义的解析器,我宁愿不做。

我还在考虑首先将xml解析为字符串以重命名这些元素,但不确定这是否是最干净的方法。任何建议都非常感谢。

以下是我正在解析的数据的示例

<Rooms>\r
          <2ndBedroomArea>144</2ndBedroomArea>\r
          <2ndKitchenArea>144</2ndKitchenArea>\r
          <3rdBedroomArea>168</3rdBedroomArea>\r
          <4thBedroomArea>156</4thBedroomArea>\r
          <FamilyRoomArea>368</FamilyRoomArea>\r
          <FormalDiningRoomArea>144</FormalDiningRoomArea>\r
          <GreatRoomArea>0</GreatRoomArea>\r
          <InformalDiningRoomArea>187</InformalDiningRoomArea>\r
          <KitchenArea>168</KitchenArea>\r
          <LaundryRoomArea>84</LaundryRoomArea>\r
          <LivingRoomArea>272</LivingRoomArea>\r
          <MasterBedroomArea>238</MasterBedroomArea>\r
          <OfficeArea>144</OfficeArea>\r
          <RecreationRoomArea>0</RecreationRoomArea>\r
          <2ndBedroomDim>12 x 12</2ndBedroomDim>\r
          <2ndKitchenDim>12 x 12</2ndKitchenDim>\r
          <3rdBedroomDim>12 x 14</3rdBedroomDim>\r
          <4thBedroomDim>13 x 12</4thBedroomDim>\r
          <FamilyRoomDim>16 x 23</FamilyRoomDim>\r
          <FormalDiningRoomDim>12 x 12</FormalDiningRoomDim>\r
          <GreatRoomDim>0 x 0</GreatRoomDim>\r
          <InformalDiningRoomDim>17 x 11</InformalDiningRoomDim>\r
          <KitchenDim>14 x 12</KitchenDim>\r
          <LaundryRoomDim>6 x 14</LaundryRoomDim>\r
          <LivingRoomDim>17 x 16</LivingRoomDim>\r
          <MasterBedroomDim>17 x 14</MasterBedroomDim>\r
          <OfficeDim>12 x 12</OfficeDim>
          <RecreationRoomDim>0 x 0</RecreationRoomDim>\r
          <2ndBedroomLen>12</2ndBedroomLen>\r
          <2ndKitchenLen>12</2ndKitchenLen>\r
          <3rdBedroomLen>12</3rdBedroomLen>\r
          <4thBedroomLen>13</4thBedroomLen>\r
          <FamilyRoomLen>16</FamilyRoomLen>\r
          <FormalDiningRoomLen>12</FormalDiningRoomLen>\r
          <GreatRoomLen>0</GreatRoomLen>\r
          <InformalDiningRoomLen>17</InformalDiningRoomLen>\r
          <KitchenLen>14</KitchenLen>\r
          <LaundryRoomLen>6</LaundryRoomLen>\r
          <LivingRoomLen>17</LivingRoomLen>\r
          <MasterBedroomLen>17</MasterBedroomLen>\r
          <OfficeLen>12</OfficeLen>\r
          <RecreationRoomLen>0</RecreationRoomLen>\r
          <2ndBedroomWid>12</2ndBedroomWid>\r
          <2ndKitchenWid>12</2ndKitchenWid>\r
          <3rdBedroomWid>14</3rdBedroomWid>\r
          <4thBedroomWid>12</4thBedroomWid>\r
          <FamilyRoomWid>23</FamilyRoomWid>\r
          <FormalDiningRoomWid>12</FormalDiningRoomWid>\r
          <GreatRoomWid>0</GreatRoomWid>\r
          <InformalDiningRoomWid>11</InformalDiningRoomWid>\r
          <KitchenWid>12</KitchenWid>\r
          <LaundryRoomWid>14</LaundryRoomWid>\r
          <LivingRoomWid>16</LivingRoomWid>\r
          <MasterBedroomWid>14</MasterBedroomWid>\r
          <OfficeWid>12</OfficeWid>\r
          <RecreationRoomWid>0</RecreationRoomWid>\r
          <5thBedroomArea>0</5thBedroomArea>\r
          <5thBedroomDim>0 x 0</5thBedroomDim>\r
          <5thBedroomLen>0</5thBedroomLen>\r
          <5thBedroomWid>0</5thBedroomWid>\r
          <6thBedroomArea>0</6thBedroomArea>\r
          <6thBedroomDim>0 x 0</6thBedroomDim>\r
          <6thBedroomLen>0</6thBedroomLen>\r
          <6thBedroomWid>0</6thBedroomWid>\r
        </Rooms>\r

1 个答案:

答案 0 :(得分:0)

Nokogiri::HTML更宽容。它需要一些调整,但它可以解析它。