Question

我试图从网站上抓取一些内容。我用了$result = (array)$arr['country_id']; echo $result[0];。我试过了，

JSoup

List<String> songs = new ArrayList<String>(); for (Element s : doc.select("#core")) { System.out.println(s.html()); songs.add(s.text()); } for (String chord : songs) { System.out.println(chord); }是#core标记。在这个<pre>标记中，我有一个像下面这样的div，

<pre>

当我废弃此内容时，Intro: G - Em - C - D G Would you dance, Em If I asked you to dance? C Would you run, D And never look back? G Would you cry, Em If you saw me crying? C D G Would you save my soul tonight? <div id="part1"> <div class="inner"> G D C I can be your hero baby G D C I can kiss away the pain G D C I will stand by you forever G D C You can take my breath away </div> </div>未在Jsoup中保持正确的格式。有没有办法获得div标记内容？

Answer 1

如果你想在不解析内容的情况下抓取内容，那么你可以做这样的事情

Connection.Response response = Jsoup.connect("URL_HERE").execute();
System.out.println(response.body()); //This will keep the format as it is from the server.

如果您想在此之后解析内容，请执行此操作

response.parse();

如果要删除某些元素，则必须解析内容。但是如果你解析它，那么那里的任何格式都将丢失。

解决方法是转义要保留空格的元素。从Jsoup https://stackoverflow.com/a/5830454/1138559的作者那里看看虽然你必须逃避<pre>的内容，因为它也包含html元素。

从JSoup中的html中删除<div>标记

1 个答案: