Question

我正在尝试从此页面上的JavaScript数组获取mp3链接的ArrayList：

该数组如下所示：

var audioPlaylist = new Playlist("1", [
{name:"The 4D Doodler", free:true, 
mp3:"http://www.archive.org/download/short_scifi_001_0711/
4ddoodler_waldeyer_edm_64kb.mp3"},
{name:"Bread Overhead", free:true, 
mp3:"http://www.archive.org/download/short_scifi_001_0711/
bread_overhead_leiber_ms_64kb.mp3"},
{name:"Image of the Gods", free:true, 
mp3:"http://www.archive.org/download/short_scifi_001_0711/
imageofthegods_nourse_jk_64kb.mp3"},

...等等

我正在尝试使用.split将其分解为字符串，这是我的异步类：

public class FillBook extends AsyncTask<Void, Void, List<String>> {

//site url to be passed into consructor
private String link;
private String imgLink;
private String title;
String description;
private List<String> tmpChapters = new ArrayList<>();
private List<SingleBook> books = new ArrayList<>();

public FillBook(String link, String imgLink, String title) {

    this.link = link;
    this.imgLink = imgLink;
    this.title = title;
}

@Override
protected List<String> doInBackground(Void... params) {

    //parsed doc will be stored in this field
    Document doc = null;

    //fields to store raw html lines used to extract book names, their thumbnails
    // as well as number of total pages of the books category
    Elements mLines;

    try {
        //connect to the site
        doc = Jsoup.connect(link).get();

    } catch (IOException | RuntimeException e) {
        e.printStackTrace();
    }
    if (doc != null) {

        // getting all elements with classname "layout"
        mLines = doc.getElementsByClass("book-description");

        //searching for book names and their thumbnails and adding them to ArrayLists
        for (Element mLine : mLines) {
            description = mLine.text();
        }

        String arr = "";
        String html = doc.body().html();
        if (html.contains("var audioPlaylist = new Playlist(\"1\", ["))
            arr = html.split("var audioPlaylist = new Playlist\\(\"1\", \\[")[1];
        if (arr.contains("]"))
            arr = arr.split("\\]")[0];
        //-----------------------------------------
        if (arr.contains("},{")) {
            for (String mLine2 : arr.split("\\},\\{")) {
                if (mLine2.contains("mp3:\""))
                    tmpChapters.add(mLine2.split("mp3:\"")[1].split("\"")[0]);
            }
        } else if (arr.contains("mp3:\""))
            tmpChapters.add(arr.split("mp3:\"")[1].split("\"")[0]);
    } else
        System.out.println("ERROR");


    return tmpChapters;

}

protected void onPostExecute(List<String> tmpChapters) {
    super.onPostExecute(tmpChapters);
    Toast.makeText(BookActivity.this, "size "+ tmpChapters.size(), Toast.LENGTH_SHORT).show();

    if (tmpChapters.size() > 0) {
        try {
            Picasso.get().load(imgLink).into(bookCover);
            nameAndAuthor.setText(title);
            bookDescription.setText(description);
            for (int i = 0; i < tmpChapters.size(); i++) {
                books.add(new SingleBook(tmpChapters.get(i)));
            }
            if (listChapters.getAdapter() != null) {
                adapter.clear();
                adapter.addAll(books);
            } else {
                adapter = new CustomAdaterChapters(BookActivity.this,
                        R.layout.book_chapters_listview_item, books);
                listChapters.setAdapter(adapter);

            }

        } catch (RuntimeException e) {
            e.printStackTrace();
        }

    } else Toast.makeText(BookActivity.this, "NETWORK ERROR", Toast.LENGTH_LONG).show();

}

我对正则表达式部分有疑问。在执行后，我做了这个Toast来检查应该是43的数组的大小，但是它只显示了1。43的第一个链接。分割代码不是我的，其他论坛的编码器帮助了我，工作，但仅此而已。我是新手，找不到错误，对我来说一切似乎都很好，但不起作用：)请帮助纠正错误。

P.S。我添加了两个日志，结果证明行正确之前的代码，数组拆分为：

{name:"Chapter 01", free:true, 
mp3:"http://www.archive.org/download/huckleberry_mfs_librivox/
huckleberry_finn_01_twain_64kb.mp3"},
{name:"Chapter 02", free:true, 
mp3:"http://www.archive.org/download/huckleberry_mfs_librivox/
huckleberry_finn_02_twain_64kb.mp3"},
{name:"Chapter 03", free:true, 
mp3:"http://www.archive.org/download/huckleberry_mfs_librivox/
huckleberry_finn_03_twain_64kb.mp3"},
{name:"Chapter 04", free:true, 
mp3:"http://www.archive.org/download/huckleberry_mfs_librivox/
huckleberry_finn_04_twain_64kb.mp3"},

但是那地方有个错误。

Answer 1

问题在于您的“ arr”有新行。通过添加此行删除它们，一切将正常运行。

        //-----------------------------------------
        arr = arr.replaceAll("\n", "");
        if (arr.contains("},{")) {

但是您是否考虑过为此使用Gson？

@Test
public void testGson() throws IOException {

    Document doc = Jsoup.connect("http://www.loyalbooks.com/book/adventures-of-huckleberry-finn-by-mark-twain").get();

    String regex = "new Playlist.*?(\\[.*?\\])";
    String string = doc.html();

    Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL);
    Matcher matcher = pattern.matcher(string);
    if (matcher.find() && matcher.groupCount() == 1) {
        String json = matcher.group(1);
        System.out.println(json);

        Gson gson = new Gson();
        PlaylistElement[] playlist = gson.fromJson(json, PlaylistElement[].class);
        System.out.println(playlist.length);

    } else {
        System.out.println("No match found");
    }

}


private static class PlaylistElement {
    private String name;
    private boolean free;
    private String mp3;
}

无法使用.split将包含数组的String行拆分为单独的链接

1 个答案: