xmlstarlet删除除xml数据提要之外的所有元素

时间:2015-06-02 16:54:06

标签: xml xpath xmlstarlet

在我的Debian VPS上,我想仅保留元素CategoryName Mobile Phones并删除所有其他具有类别名称的元素,例如Mobile Accessories Laptops等。共有20个不同的类别名称。 XML文件大小为800 MB。

xmlstarlet el -u sd.xml
Products
Products/Product
Products/Product/Brand
Products/Product/CategoryName
Products/Product/CategoryPathAsString

此处示例XML:

<Products>
<Product>
   <ProductID>92545172</ProductID>
   <ProductSKU>630348288360</ProductSKU>
   <ProductName>Self Snap Aux Connected Selfie Stick</ProductName>
   <ProductDescription>This product is charge free </ProductDescription>
   <ProductPrice>353.00</ProductPrice>
   <ProductPriceCurrency>INR</ProductPriceCurrency>
   <WasPrice>649.00</WasPrice>
   <DiscountedPrice>0.00</DiscountedPrice>
   <ProductURL>http://clk</ProductURL>
   <PID>8053</PID>
   <MID>159526</MID>
   <ProductImageLargeURL>http://</ProductImageLargeURL>
   <StockAvailability>in stock</StockAvailability>
   <Brand>Self Snap</Brand>
   <CategoryName>Camera Accessories</CategoryName>
   <CategoryPathAsString>Root|Cameras &amp; Accessories|Camera Accessories|</CategoryPathAsString>
</Product>
<Product>
   <ProductID>29911116</ProductID>
   <ProductSKU>647266238</ProductSKU>
   <ProductName>Philips 40PFL5059/V7 40 inches Full HD LED Television</ProductName>
   <ProductDescription>LED Display Resolution : 1920 x 1080</ProductDescription>
   <ProductPrice>30196.00</ProductPrice>
   <ProductPriceCurrency>INR</ProductPriceCurrency>
   <WasPrice>39800.00</WasPrice>
   <DiscountedPrice>0.00</DiscountedPrice>
   <ProductURL>http://clk</ProductURL>
   <PID>8053</PID>
   <MID>159526</MID>
   <ProductImageLargeURL>http://n1</ProductImageLargeURL>
   <StockAvailability>in stock</StockAvailability>
   <Brand>Philips</Brand>
   <CategoryName>Televisions</CategoryName>
   <CategoryPathAsString>Root|TVs, Audio &amp; Video|Televisions|</CategoryPathAsString>
</Product>
<Product>
   <ProductID>93959216</ProductID>
   <ProductSKU>683203029</ProductSKU>
   <ProductName>Micromax Canvas Beat A114R</ProductName>
   <ProductDescription>Type : MultiSim Sim : Dual SIM Os Version : Android </ProductDescription>
   <ProductPrice>7999.00</ProductPrice>
   <ProductPriceCurrency>INR</ProductPriceCurrency>
   <WasPrice>9990.00</WasPrice>
   <DiscountedPrice>0.00</DiscountedPrice>
   <ProductURL>http://clk</ProductURL>
   <PID>8053</PID>
   <MID>159526</MID>
   <ProductImageLargeURL>http://n1</ProductImageLargeURL>
   <StockAvailability>in stock</StockAvailability>
   <Brand>Micromax</Brand>
   <CategoryName>Mobile Phones</CategoryName>
   <CategoryPathAsString>Root|Mobiles &amp; Tablets|Mobile Phones|</CategoryPathAsString>
</Product>
</Products>

1 个答案:

答案 0 :(得分:2)

没有示例XML和预期结果XML,这一点并不清晰。假设您要删除内部文本不等于CategoryName的名为"Mobile Phones"的元素,您可以尝试使用此xpath:

/Products/Product/CategoryName[. != 'Mobile Phones']

原来,您要删除具有子元素<Product>值不等于<CategoryName>的{​​{1}}元素。在这种情况下,您可以尝试以下xpath:

"Mobile Phones"