我想从他们的图片标签中获取人物的名字。我试图使用JSOUP来做到这一点。这就是我到目前为止所做的:
/**
* Created by AakarshM on 9/28/2016.
*/
import com.sun.xml.internal.ws.policy.privateutil.PolicyUtils;
import org.jsoup.Jsoup;
import org.jsoup.helper.Validate;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
import java.util.logging.Logger;
public class JSOUPMAIN{
public static void main(String[] args) throws IOException{
try {
String url = "http://www.posh24.com/celebrities";
Document doc = Jsoup.connect(url).get();
Elements paragraphs = doc.select("div.channelListEntry");
for(Element p : paragraphs)
System.out.println(p.text());
} catch (IOException e) {
}
}
}
这至少告诉我一些东西,它会给我这个名字,但附加信息。例如:
4 +12 Zayn Malik
我不需要额外的信息,我该如何解决?
答案 0 :(得分:1)
你应该能够从" alt"属性。查看this
答案 1 :(得分:1)
示例代码
userAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36";
Document doc = Jsoup.connect("http://www.posh24.com/celebrities").userAgent(userAgent).timeout(10000).get();
for (Element image : doc.select("#webx_center > div > div > div > a > div.image > img")) {
System.out.println(image.attr("alt") + "\n\t" + image.attr("abs:src"));
}
<强>输出强>
Rita Ora
http://cdn.posh24.com/images/:profile/0a749b802defbf357e7ccf1361ccabef5
Justin Bieber
http://cdn.posh24.com/images/:profile/081e091efd98b96e82e81a8490a0fb4dd
Rob Kardashian
http://cdn.posh24.com/images/:profile/083354e61b44581df09f38aaffd5fe901
....
旁注:有关如何获取css选择器的简短介绍,请参阅此答案:https://stackoverflow.com/a/39632003/1661938
答案 2 :(得分:0)
尝试doc.select("div.channelListEntry div.name");