我一直在寻找stackoverflow,但无法让任何人遇到这类问题。
我想做这样的事情:
输入字符串:
<?xml version="1.0" encoding="UTF-8" ?>
<List>
<Object>
<Section>Fruit</Section>
<Category>Bananas</Category>
<Brand>Chiquita</Brand>
<Obs><p>
Vende-se a peças ou o conjunto.</p><br>
</Obs>
</Object>
</List>
我想要的是删除html标签,例如<p>,<br>
等。所以它的结尾如下:
<?xml version="1.0" encoding="UTF-8" ?>
<List>
<Object>
<Section>Fruit</Section>
<Category>Bananas</Category>
<Brand>Chiquita</Brand>
<Obs>
Vende-se a peças ou o conjunto.
</Obs>
</Object>
</List>
我一直在玩JSoup,但我似乎无法让它正常工作。
这是我的代码:
Whitelist whitelist = Whitelist.none();
String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?><List><Object><Section>Fruit</Section><Category>Bananas</Category><Brand>Chiquita</Brand><Obs><p>Vende-se a peças ou o conjunto.</p><br></Obs></Object></List>";
whitelist.addTags(new String[]{"?xml", "List", "Object", "Section", "Category", "Brand", "Obs"});
String safe = Jsoup.clean(xml, whitelist);
这是我获得的结果:
FruitBananasChiquitaVende-se a peças ou o conjunto.
提前致谢
答案 0 :(得分:3)
标签是小写的,使用:
whitelist.addTags(new String[] { "?xml", "list", "object", "section",
"category", "brand", "obs" });
输出:
<list>
<object>
<section>
Fruit
</section>
<category>
Bananas
</category>
<brand>
Chiquita
</brand>
<obs>
Vende-se a peças ou o conjunto.
</obs></object>
</list>
答案 1 :(得分:2)
您可以使用unwrap()
执行此操作:
示例:强>
final String input = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n"
+ "<List>\n"
+ " <Object>\n"
+ " <Section>Fruit</Section>\n"
+ " <Category>Bananas</Category>\n"
+ " <Brand>Chiquita</Brand>\n"
+ " <Obs><p>\n"
+ "Vende-se a peças ou o conjunto.</p><br>\n"
+ " </Obs>\n"
+ " </Object>\n"
+ "</List>";
Document doc = Jsoup.parse(input, "", Parser.xmlParser()); // XML-Parser!
doc.select("p").unwrap(); // unwrapes all p-tags
doc.select("br").unwrap(); // uńwraps all br-tags
此处最好使用 XML-Parser 而不是 HTML-Parser 。
<强>输出:强>
<?xml version="1.0" encoding="UTF-8" ?>
<list>
<object>
<section>
Fruit
</section>
<category>
Bananas
</category>
<brand>
Chiquita
</brand>
<obs>
Vende-se a peças ou o conjunto.
</obs> </object>
</list>