我如何用JSOUP解析这个?

时间:2016-09-28 05:32:55

标签: java html

来自:page to get the names from

Photo of target

我想从他们的图片标签中获取人物的名字。我试图使用JSOUP来做到这一点。这就是我到目前为止所做的:

/**
 * Created by AakarshM on 9/28/2016.
 */


import com.sun.xml.internal.ws.policy.privateutil.PolicyUtils;
import org.jsoup.Jsoup;
import org.jsoup.helper.Validate;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;
import java.util.logging.Logger;


public class JSOUPMAIN{

    public static void main(String[] args) throws IOException{


        try {

            String url = "http://www.posh24.com/celebrities";
            Document doc = Jsoup.connect(url).get();
            Elements paragraphs = doc.select("div.channelListEntry");
            for(Element p : paragraphs)
                System.out.println(p.text());

        } catch (IOException e) {


        }


    }

}

这至少告诉我一些东西,它会给我这个名字,但附加信息。例如:

4 +12 Zayn Malik

我不需要额外的信息,我该如何解决?

3 个答案:

答案 0 :(得分:1)

你应该能够从" alt"属性。查看this

答案 1 :(得分:1)

示例代码

userAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36";

Document doc = Jsoup.connect("http://www.posh24.com/celebrities").userAgent(userAgent).timeout(10000).get();

for (Element image : doc.select("#webx_center > div > div > div > a > div.image > img")) {
    System.out.println(image.attr("alt") + "\n\t" + image.attr("abs:src"));
}

<强>输出

Rita Ora
    http://cdn.posh24.com/images/:profile/0a749b802defbf357e7ccf1361ccabef5
Justin Bieber
    http://cdn.posh24.com/images/:profile/081e091efd98b96e82e81a8490a0fb4dd
Rob Kardashian
    http://cdn.posh24.com/images/:profile/083354e61b44581df09f38aaffd5fe901
....

旁注:有关如何获取css选择器的简短介绍,请参阅此答案:https://stackoverflow.com/a/39632003/1661938

答案 2 :(得分:0)

尝试doc.select("div.channelListEntry div.name");