如何解析音频文件并通过jsoup下载?

时间:2016-09-04 08:12:41

标签: java jsoup html-parsing

我尝试通过jsoup解析网站中的mp3文件。 我用这种方式编写代码:

String url = "http://www.xeno-canto.org/explore?query=Haemorhous+mexicanus+&dir=0&order=loc";
    try {
        Document doc = Jsoup.connect(url).get();

        System.out.println(doc.title());
        Elements h1s = doc.getElementsByTag("div>audio[src]");    

        Element thisOne = null;
        for(Iterator it = h1s.iterator(); it.hasNext();)
        {
            thisOne = (Element)it.next();
            System.out.println(thisOne.html());
        }

虽然html中有很多<div>层,但我不知道我是否可以通过这种方式询问<audio>标记下的内容。 html看起来像这样:

&#13;
&#13;
<div class="jp-player jp-player-219351" id="p_xc_audio_219351_883" style="width: 0px; height: 0px;"><img id="jp_poster_17" style="width: 0px; height: 0px; display: none;"><audio id="jp_audio_17" preload="none" src="http://www.xeno-canto.org/sounds/uploaded/RFTXRYBVBX/XC219351-House%20Finch%20calls%20and%20then%20calls%20in%20flight%20-CA%2C%20TRV%2C%20March%2003%2C%20%E2%80%8E2012%2C%201045%20AM.mp3"></audio></div>
&#13;
&#13;
&#13;

我的目标是在所有音频[src]标签下解析和下载mp3文件,但尝试多次后它并没有成功。希望有人能给我任何暗示。

2 个答案:

答案 0 :(得分:0)

您可以使用以下代码执行此操作:

String url = "http://www.xeno-canto.org/explore?query=Haemorhous+mexicanus+&dir=0&order=loc";
try {
    Document doc = Jsoup.connect(url).get();

    System.out.println(doc.title());
    Elements h1s = doc.select(".jp-type-single");
    System.out.println("Number of results: " + h1s.size());
    for (Element element : h1s) {
        String mp3Url = element.attr("data-xc-filepath");
        System.out.println("mp3 url: " + mp3Url);
    }
} catch (Exception ex) {
    ex.printStackTrace();
}

一些建议:

答案 1 :(得分:0)

非常感谢Davide! 我成功解析所有这些mp3文件只用了2分钟! 我添加标准IO来保存文件,整个代码如下所示:

public static void main(String[] args) {
    // TODO code application logic here
    int file_num =0;

    for(int page=1; page <=7; page++){

        String url = "http://www.xeno-canto.org/explore?query=Haemorhous+mexicanus+&dir=0&order=loc&pg="+page;

        try {
            Document doc = Jsoup.connect(url).get();

            System.out.println(doc.title());
            Elements h1s = doc.select(".jp-type-single"); 
            System.out.println("Number of results: " + h1s.size());
            for (Element element : h1s) { 
                String mp3Url = element.attr("data-xc-filepath"); 
                System.out.println("mp3 url: " + mp3Url);
                file_num++;

                URLConnection conn = new URL(mp3Url).openConnection();
                InputStream is = conn.getInputStream();

                OutputStream outstream = new FileOutputStream(new File("/users/pelican/downloads/"+file_num+"file.mp3"));
                byte[] buffer = new byte[4096];
                int len;
                while ((len = is.read(buffer)) > 0) {
                    outstream.write(buffer, 0, len);
                }
                outstream.close();           
            }
        }   catch (Exception ex) {
            ex.printStackTrace();
        }


    }  
}