注意使用jsoup从html解析mp3文件的URL的帮助

时间:2018-12-20 16:25:04

标签: android jsoup html-parsing

我有一个通过jsoup解析数据的异步方法fillind pojo类字段。我正在尝试通过foreach从此页面解析该书中单个章节的mp3文件的url,但是我尝试过的所有查询都失败了。

http://www.loyalbooks.com/book/adventures-of-huckleberry-finn-by-mark-twain

页面代码中的单个元素看起来像这样,并且ID号在各章之间不断变化

<div class="jp-free-media" style="font-size:xx-small;">(<a id="jp_playlist_1_item_0_mp3" href="http://www.archive.org/download/huckleberry_mfs_librivox/huckleberry_finn_01_twain_64kb.mp3" tabindex="1">download</a>)</div>

我的AsyncTask,在mLines2中搜索mp3 URL:

public class FillBook extends AsyncTask<Void, Void, SingleBook> {

private String link;
private String imgLink;
private String title;
ArrayList<String> tmpChapters = new ArrayList<>();
private SingleBook book;

public FillBook(String link, String imgLink, String title) {

    this.link = link;
    this.imgLink = imgLink;
    this.title = title;
}

@Override
protected SingleBook doInBackground(Void... params) {

    Document doc = null;
    book = new SingleBook(imgLink, title, false, false, null, new ArrayList<String>());


    Elements mLines;
    Elements mLines2;

    try {
         doc = Jsoup.connect(link).get();

    } catch (IOException | RuntimeException e) {
        e.printStackTrace();
    }
    if (doc != null) {


        mLines = doc.getElementsByClass("book-description");


        for (Element mLine : mLines) {
            String description= mLine.text();
            book.setDescription(description);

        }

        mLines2 = doc.select(".jp-free-media");
        for (Element mLine2 : mLines2) {
            tmpChapters.add(mLine2.attr("href"));
        }
    }else
        System.out.println("ERROR");

    book.setChapters(tmpChapters);
    return book;

}

protected void onPostExecute(SingleBook book) {

    super.onPostExecute(book);

            Toast.makeText(BookActivity.this, book.getChapters().get(0), Toast.LENGTH_LONG).show();
            Picasso.get().load(book.getImgUrl()).into(bookCover);
            nameAndAuthor.setText(book.getTitleAndAuthor());
            bookDescription.setText(book.getDescription());

最后我得到了空的ArrayList。 考虑到下一章将是id =“ jp_playlist_1_item_1_mp3”,如何获取http://www.archive.org/download/huckleberry_mfs_librivox/huckleberry_finn_01_twain_64kb.mp3字符串?

1 个答案:

答案 0 :(得分:0)

Russian Stackoverflow的Tiarait帮助找到了解决方案。关键是上述元素是由js创建的。我需要获取文档主体,然后通过拆分获取以下数组。

var audioPlaylist = new Playlist(“ 1”,[ {name:“第01章”,free:true,mp3:“ http://www.archive.org/download/huckleberry_mfs_librivox/huckleberry_finn_01_twain_64kb.mp3”}, {name:“第02章”,free:true,mp3:“ http://www.archive.org/download/huckleberry_mfs_librivox/huckleberry_finn_02_twain_64kb.mp3”}, ...

doInBackground方法应更改为此:

@Override
protected SingleBook doInBackground(Void... params) {

Document doc = null;
book = new SingleBook(imgLink, title, false, false, null, new ArrayList<String>());


Elements mLines;

try {
    doc = Jsoup.connect(link).get();

} catch (IOException | RuntimeException e) {
    e.printStackTrace();
}
if (doc != null) {


    mLines = doc.getElementsByClass("book-description");


    for (Element mLine : mLines) {
        String description= mLine.text();
        book.setDescription(description);

    }


    String arr = "";
    String html = doc.body().html();
    if (html.contains("var audioPlaylist = new Playlist(\"1\", ["))
        arr = html.split("var audioPlaylist = new Playlist\\(\"1\", \\[")[1];
    if (arr.contains("]"))
        arr = arr.split("\\]")[0];
    //-----------------------------------------
    if (arr.contains("},{")) {
        for (String mLine2 : arr.split("\\},\\{")) {
            if (mLine2.contains("mp3:\""))
                tmpChapters.add(mLine2.split("mp3:\"")[1].split("\"")[0]);
        }
    } else if (arr.contains("mp3:\""))
        tmpChapters.add(arr.split("mp3:\"")[1].split("\"")[0]);
}else
    System.out.println("ERROR");

book.setChapters(tmpChapters);
return book;

}