我尝试通过jsoup解析网站中的mp3文件。 我用这种方式编写代码:
String url = "http://www.xeno-canto.org/explore?query=Haemorhous+mexicanus+&dir=0&order=loc";
try {
Document doc = Jsoup.connect(url).get();
System.out.println(doc.title());
Elements h1s = doc.getElementsByTag("div>audio[src]");
Element thisOne = null;
for(Iterator it = h1s.iterator(); it.hasNext();)
{
thisOne = (Element)it.next();
System.out.println(thisOne.html());
}
虽然html中有很多<div>
层,但我不知道我是否可以通过这种方式询问<audio>
标记下的内容。 html看起来像这样:
<div class="jp-player jp-player-219351" id="p_xc_audio_219351_883" style="width: 0px; height: 0px;"><img id="jp_poster_17" style="width: 0px; height: 0px; display: none;"><audio id="jp_audio_17" preload="none" src="http://www.xeno-canto.org/sounds/uploaded/RFTXRYBVBX/XC219351-House%20Finch%20calls%20and%20then%20calls%20in%20flight%20-CA%2C%20TRV%2C%20March%2003%2C%20%E2%80%8E2012%2C%201045%20AM.mp3"></audio></div>
&#13;
我的目标是在所有音频[src]标签下解析和下载mp3文件,但尝试多次后它并没有成功。希望有人能给我任何暗示。
答案 0 :(得分:0)
您可以使用以下代码执行此操作:
String url = "http://www.xeno-canto.org/explore?query=Haemorhous+mexicanus+&dir=0&order=loc";
try {
Document doc = Jsoup.connect(url).get();
System.out.println(doc.title());
Elements h1s = doc.select(".jp-type-single");
System.out.println("Number of results: " + h1s.size());
for (Element element : h1s) {
String mp3Url = element.attr("data-xc-filepath");
System.out.println("mp3 url: " + mp3Url);
}
} catch (Exception ex) {
ex.printStackTrace();
}
一些建议:
doc.getElementsByTag(String tagName)
接受标记名称而非CSS查询。如果您想以这种方式进行选择,则应使用select(String cssQuery)
; Iterator
来循环Elements
,而是使用foreach,因为它会延伸java.util.ArrayList<Element>
而且更简单。答案 1 :(得分:0)
非常感谢Davide! 我成功解析所有这些mp3文件只用了2分钟! 我添加标准IO来保存文件,整个代码如下所示:
public static void main(String[] args) {
// TODO code application logic here
int file_num =0;
for(int page=1; page <=7; page++){
String url = "http://www.xeno-canto.org/explore?query=Haemorhous+mexicanus+&dir=0&order=loc&pg="+page;
try {
Document doc = Jsoup.connect(url).get();
System.out.println(doc.title());
Elements h1s = doc.select(".jp-type-single");
System.out.println("Number of results: " + h1s.size());
for (Element element : h1s) {
String mp3Url = element.attr("data-xc-filepath");
System.out.println("mp3 url: " + mp3Url);
file_num++;
URLConnection conn = new URL(mp3Url).openConnection();
InputStream is = conn.getInputStream();
OutputStream outstream = new FileOutputStream(new File("/users/pelican/downloads/"+file_num+"file.mp3"));
byte[] buffer = new byte[4096];
int len;
while ((len = is.read(buffer)) > 0) {
outstream.write(buffer, 0, len);
}
outstream.close();
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}