如何使用WebGrude刮取搜索结果?

时间:2016-04-05 12:08:38

标签: java web-scraping

我最近使用WebGrude来抓取网页上的一些内容。然后我试图从e-bay中搜集一些搜索结果。尝试了什么,

@Page("http://www.ebay.com/sch/{0}")
public class PirateBay {

    public static void main(String[] args) {
        //Search calls Browser, which loads the page on a PirateBay instance
        PirateBay search = PirateBay.search("iPhone");

        while (search != null) {
             search.magnets.forEach(System.out::println);
            search = search.nextPage();
        }
    }

    public static PirateBay search(String term) {
        return Browser.get(PirateBay.class, term);
    }

    private PirateBay() {
    }

    /*
* This selector matches all magnet links. The result is added to this String list.
* The default behaviour is to use the rendered html inside the matched tag, but here
* we want to use the href value instead.
*/
    @Selector(value = "#ResultSetItems a[href*=magnet]", attr = "href")
    public List<String> magnets;

/*
* This selector matches a link to the next page result, wich can be mapped to a PirateBay instance.
* The Link next gets the page on the href attribute of the link when method visit is called.
*/
    @Selector("a:has(img[alt=Next])")
    private Link<PirateBay> next;

    public PirateBay nextPage() {
        if (next == null)
            return null;
        return next.visit();
        }
    }

但结果是空的。我怎样才能使用它来搜索搜索结果?

1 个答案:

答案 0 :(得分:0)

选择器“#ResultSetItems a [href * = magnet]”选择href属性在其值上包含字符串“magnet”的链接。

在这里,您可以阅读有关Atribute选择器的更多信息:attribute_selectors

你想要的是“#ResultSetItems h3.lvtitle a”

为了测试您的选择器,有一个很好的repl使用Jsoup,这是webgrude使用的相同库Try jsoup