我有一个通过jsoup解析数据的异步方法fillind pojo类字段。我正在尝试通过foreach从此页面解析该书中单个章节的mp3文件的url,但是我尝试过的所有查询都失败了。
http://www.loyalbooks.com/book/adventures-of-huckleberry-finn-by-mark-twain
页面代码中的单个元素看起来像这样,并且ID号在各章之间不断变化
<div class="jp-free-media" style="font-size:xx-small;">(<a id="jp_playlist_1_item_0_mp3" href="http://www.archive.org/download/huckleberry_mfs_librivox/huckleberry_finn_01_twain_64kb.mp3" tabindex="1">download</a>)</div>
我的AsyncTask,在mLines2中搜索mp3 URL:
public class FillBook extends AsyncTask<Void, Void, SingleBook> {
private String link;
private String imgLink;
private String title;
ArrayList<String> tmpChapters = new ArrayList<>();
private SingleBook book;
public FillBook(String link, String imgLink, String title) {
this.link = link;
this.imgLink = imgLink;
this.title = title;
}
@Override
protected SingleBook doInBackground(Void... params) {
Document doc = null;
book = new SingleBook(imgLink, title, false, false, null, new ArrayList<String>());
Elements mLines;
Elements mLines2;
try {
doc = Jsoup.connect(link).get();
} catch (IOException | RuntimeException e) {
e.printStackTrace();
}
if (doc != null) {
mLines = doc.getElementsByClass("book-description");
for (Element mLine : mLines) {
String description= mLine.text();
book.setDescription(description);
}
mLines2 = doc.select(".jp-free-media");
for (Element mLine2 : mLines2) {
tmpChapters.add(mLine2.attr("href"));
}
}else
System.out.println("ERROR");
book.setChapters(tmpChapters);
return book;
}
protected void onPostExecute(SingleBook book) {
super.onPostExecute(book);
Toast.makeText(BookActivity.this, book.getChapters().get(0), Toast.LENGTH_LONG).show();
Picasso.get().load(book.getImgUrl()).into(bookCover);
nameAndAuthor.setText(book.getTitleAndAuthor());
bookDescription.setText(book.getDescription());
最后我得到了空的ArrayList。 考虑到下一章将是id =“ jp_playlist_1_item_1_mp3”,如何获取http://www.archive.org/download/huckleberry_mfs_librivox/huckleberry_finn_01_twain_64kb.mp3字符串?
答案 0 :(得分:0)
Russian Stackoverflow的Tiarait帮助找到了解决方案。关键是上述元素是由js创建的。我需要获取文档主体,然后通过拆分获取以下数组。
var audioPlaylist = new Playlist(“ 1”,[ {name:“第01章”,free:true,mp3:“ http://www.archive.org/download/huckleberry_mfs_librivox/huckleberry_finn_01_twain_64kb.mp3”}, {name:“第02章”,free:true,mp3:“ http://www.archive.org/download/huckleberry_mfs_librivox/huckleberry_finn_02_twain_64kb.mp3”}, ...
doInBackground方法应更改为此:
@Override
protected SingleBook doInBackground(Void... params) {
Document doc = null;
book = new SingleBook(imgLink, title, false, false, null, new ArrayList<String>());
Elements mLines;
try {
doc = Jsoup.connect(link).get();
} catch (IOException | RuntimeException e) {
e.printStackTrace();
}
if (doc != null) {
mLines = doc.getElementsByClass("book-description");
for (Element mLine : mLines) {
String description= mLine.text();
book.setDescription(description);
}
String arr = "";
String html = doc.body().html();
if (html.contains("var audioPlaylist = new Playlist(\"1\", ["))
arr = html.split("var audioPlaylist = new Playlist\\(\"1\", \\[")[1];
if (arr.contains("]"))
arr = arr.split("\\]")[0];
//-----------------------------------------
if (arr.contains("},{")) {
for (String mLine2 : arr.split("\\},\\{")) {
if (mLine2.contains("mp3:\""))
tmpChapters.add(mLine2.split("mp3:\"")[1].split("\"")[0]);
}
} else if (arr.contains("mp3:\""))
tmpChapters.add(arr.split("mp3:\"")[1].split("\"")[0]);
}else
System.out.println("ERROR");
book.setChapters(tmpChapters);
return book;
}