我和JSOUP玩了一点,但是我无法从网站上获得我想要的信息,我需要一些帮助。 例如,我有这个website,我想提取一些这样的信息:
-ROCK
--ACID ROCK
--PSYCHEDELIC ROCK
--BLUES ROCK
-----Aerosmith
----------One Way Street
-----AC/DC
----------Ain't No Fun (Waiting Round to Be a Millionaire)
换句话说......我想要一个带有流派的列表,其中包含带有歌曲列表的艺术家的列表......
-Genre1
--Artist1
---Song1
---Song2
---Song3
--Artist2
---Song1
-Genre2
...
这是我到目前为止(对于凌乱的代码感到抱歉):
package parser;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class HTMLParser {
public static void main(String[] args) {
String HTMLSTring = "<!DOCTYPE html>"
+ "<html>"
+ "<head>"
+ "<title>Music</title>"
+ "</head>"
+ "<body>"
+ "<table><tr><td><h1>Artists</h1></tr>"
+ "</table>"
+ "</body>"
+ "</html>";
Document html = Jsoup.parse(HTMLSTring);
String genre = "genres";
String artist = "artist";
String album = "album";
String song = "song";
//Document htmlFile = null;
//Element div = html.body().getElementsByAttributeValueMatching(genre, null);
// String div = html.body().getElementsByAttributeValueMatching(genre, null).text();
//String cssClass = div.className();
List genreList = new ArrayList();
String title = html.title();
String h1 = html.body().getElementsByTag("h1").text();
String h2 = html.body().getElementsByClass(genre).text();
String gen = html.body().getAllElements().text();
Document doc;
try {
doc = Jsoup.connect("http://allmusic.com/").get();
title = doc.title();
h1 = doc.text();
h2 = doc.text();
} catch (IOException e)
{
e.printStackTrace();
}
System.out.println("Title : " + title);
//System.out.println("h1 : "+ h1);
//System.out.println("h2 : "+ h2);
System.out.println("gen : all elements : " + gen);
}
}
这是我的输出:
Title : AllMusic | Record Reviews, Streaming Songs, Genres & Bands
gen : all elements : Artists Artists Artists Artists Artists Artists
到目前为止我还没有... 我不知道如何提取信息......(例如类型,艺术家姓名......)