我正在处理一个项目,我从服务器请求XML文档并解析它以将数据导入我的系统。我正在使用Ruby 2.4.3。
我的问题是XML带有名称以数字开头的元素标签。 Nokogiri将这些标签视为上一个标签内容的一部分,并且会混淆解析文档的其余部分。
我正在使用Mechanize来请求数据。任何想法的想法?
我唯一能想到的是在Mechanize中编写一个完全自定义的解析器,我宁愿不做。
我还在考虑首先将xml解析为字符串以重命名这些元素,但不确定这是否是最干净的方法。任何建议都非常感谢。
以下是我正在解析的数据的示例
<Rooms>\r
<2ndBedroomArea>144</2ndBedroomArea>\r
<2ndKitchenArea>144</2ndKitchenArea>\r
<3rdBedroomArea>168</3rdBedroomArea>\r
<4thBedroomArea>156</4thBedroomArea>\r
<FamilyRoomArea>368</FamilyRoomArea>\r
<FormalDiningRoomArea>144</FormalDiningRoomArea>\r
<GreatRoomArea>0</GreatRoomArea>\r
<InformalDiningRoomArea>187</InformalDiningRoomArea>\r
<KitchenArea>168</KitchenArea>\r
<LaundryRoomArea>84</LaundryRoomArea>\r
<LivingRoomArea>272</LivingRoomArea>\r
<MasterBedroomArea>238</MasterBedroomArea>\r
<OfficeArea>144</OfficeArea>\r
<RecreationRoomArea>0</RecreationRoomArea>\r
<2ndBedroomDim>12 x 12</2ndBedroomDim>\r
<2ndKitchenDim>12 x 12</2ndKitchenDim>\r
<3rdBedroomDim>12 x 14</3rdBedroomDim>\r
<4thBedroomDim>13 x 12</4thBedroomDim>\r
<FamilyRoomDim>16 x 23</FamilyRoomDim>\r
<FormalDiningRoomDim>12 x 12</FormalDiningRoomDim>\r
<GreatRoomDim>0 x 0</GreatRoomDim>\r
<InformalDiningRoomDim>17 x 11</InformalDiningRoomDim>\r
<KitchenDim>14 x 12</KitchenDim>\r
<LaundryRoomDim>6 x 14</LaundryRoomDim>\r
<LivingRoomDim>17 x 16</LivingRoomDim>\r
<MasterBedroomDim>17 x 14</MasterBedroomDim>\r
<OfficeDim>12 x 12</OfficeDim>
<RecreationRoomDim>0 x 0</RecreationRoomDim>\r
<2ndBedroomLen>12</2ndBedroomLen>\r
<2ndKitchenLen>12</2ndKitchenLen>\r
<3rdBedroomLen>12</3rdBedroomLen>\r
<4thBedroomLen>13</4thBedroomLen>\r
<FamilyRoomLen>16</FamilyRoomLen>\r
<FormalDiningRoomLen>12</FormalDiningRoomLen>\r
<GreatRoomLen>0</GreatRoomLen>\r
<InformalDiningRoomLen>17</InformalDiningRoomLen>\r
<KitchenLen>14</KitchenLen>\r
<LaundryRoomLen>6</LaundryRoomLen>\r
<LivingRoomLen>17</LivingRoomLen>\r
<MasterBedroomLen>17</MasterBedroomLen>\r
<OfficeLen>12</OfficeLen>\r
<RecreationRoomLen>0</RecreationRoomLen>\r
<2ndBedroomWid>12</2ndBedroomWid>\r
<2ndKitchenWid>12</2ndKitchenWid>\r
<3rdBedroomWid>14</3rdBedroomWid>\r
<4thBedroomWid>12</4thBedroomWid>\r
<FamilyRoomWid>23</FamilyRoomWid>\r
<FormalDiningRoomWid>12</FormalDiningRoomWid>\r
<GreatRoomWid>0</GreatRoomWid>\r
<InformalDiningRoomWid>11</InformalDiningRoomWid>\r
<KitchenWid>12</KitchenWid>\r
<LaundryRoomWid>14</LaundryRoomWid>\r
<LivingRoomWid>16</LivingRoomWid>\r
<MasterBedroomWid>14</MasterBedroomWid>\r
<OfficeWid>12</OfficeWid>\r
<RecreationRoomWid>0</RecreationRoomWid>\r
<5thBedroomArea>0</5thBedroomArea>\r
<5thBedroomDim>0 x 0</5thBedroomDim>\r
<5thBedroomLen>0</5thBedroomLen>\r
<5thBedroomWid>0</5thBedroomWid>\r
<6thBedroomArea>0</6thBedroomArea>\r
<6thBedroomDim>0 x 0</6thBedroomDim>\r
<6thBedroomLen>0</6thBedroomLen>\r
<6thBedroomWid>0</6thBedroomWid>\r
</Rooms>\r
答案 0 :(得分:0)
Nokogiri::HTML
更宽容。它需要一些调整,但它可以解析它。