您好,我想解析一家商店http://www.mercateo.com。到目前为止,我已经使用了硒。它工作得很好,但速度很慢。我想解决我的问题。我找到了HtmlUtil和JSoup,但我觉得我在链接上的clic有问题并转到下一页。
我用HtmlUtil写了一个简单的例子:
WebClient web = new WebClient();
HtmlPage page = web.getPage("http://news.yahoo.com/");
web.closeAllWindows();
但是我收到了很多警告和错误:
WARNING: CSS warning: 'http://l.yimg.com/zz/combo?d/lib/yui/3.4.1/build/cssreset/cssreset-min.css&d/lib/yui/3.4.1/build/cssfonts/cssfonts-min.css&os/mit/media/p/presentation/grids/master-min-464195.css&os/mit/media/p/presentation/grids/desktop-min-841473.css&os/mit/media/p/presentation/base/master-min-470440.css&os/mit/media/p/presentation/base/desktop-min-341885.css&kx/ucs/uh/css/291/yunivhead-min.css&kx/ucs/uh/css/221/logo-min.css&kx/ucs/homepage/css/155/homepage-ie-min.css&kx/ucs/notif_v2/css/145/notifications_v2-min.css&kx/ucs/mailcount/css/37/mail_preview-min.css&kx/ucs/search/css/190/search_all-min.css&kx/ucs/search/css/190/search_buttons-min.css&kx/ucs/breakingnews/css/12/breaking_news-min.css&os/mit/media/m/header/header-desktop-min-630857.css&os/mit/media/m/navigation/navigation-desktop-min-603998.css&os/mit/media/m/linkbox/linkbox-min-248956.css&os/mit/media/m/ads/ads-min-892923.css&os/mit/media/m/heading/heading-min-214964.css&os/gm/m/footer/footer_sponsor-min-188629.css&os/gm/m/footer/footer_links-min-188629.css&os/mit/media/m/trending/trending-min-150139.css&os/gm/m/footer/footer_info-min-323669.css&os/gm/m/footer/footer_info-desktop-min-944911.css' [20:3604] Ignoring the following declarations in this rule.
sty 29, 2013 11:54:03 AM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://l.yimg.com/zz/combo?d/lib/yui/3.4.1/build/cssreset/cssreset-min.css&d/lib/yui/3.4.1/build/cssfonts/cssfonts-min.css&os/mit/media/p/presentation/grids/master-min-464195.css&os/mit/media/p/presentation/grids/desktop-min-841473.css&os/mit/media/p/presentation/base/master-min-470440.css&os/mit/media/p/presentation/base/desktop-min-341885.css&kx/ucs/uh/css/291/yunivhead-min.css&kx/ucs/uh/css/221/logo-min.css&kx/ucs/homepage/css/155/homepage-ie-min.css&kx/ucs/notif_v2/css/145/notifications_v2-min.css&kx/ucs/mailcount/css/37/mail_preview-min.css&kx/ucs/search/css/190/search_all-min.css&kx/ucs/search/css/190/search_buttons-min.css&kx/ucs/breakingnews/css/12/breaking_news-min.css&os/mit/media/m/header/header-desktop-min-630857.css&os/mit/media/m/navigation/navigation-desktop-min-603998.css&os/mit/media/m/linkbox/linkbox-min-248956.css&os/mit/media/m/ads/ads-min-892923.css&os/mit/media/m/heading/heading-min-214964.css&os/gm/m/footer/footer_sponsor-min-188629.css&os/gm/m/footer/footer_links-min-188629.css&os/mit/media/m/trending/trending-min-150139.css&os/gm/m/footer/footer_info-min-323669.css&os/gm/m/footer/footer_info-desktop-min-944911.css' [20:3996] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
sty 29, 2013 11:54:03 AM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://l.yimg.com/zz/combo?d/lib/yui/3.4.1/build/cssreset/cssreset-min.css&d/lib/yui/3.4.1/build/cssfonts/cssfonts-min.css&os/mit/media/p/presentation/grids/master-min-464195.css&os/mit/media/p/presentation/grids/desktop-min-841473.css&os/mit/media/p/presentation/base/master-min-470440.css&os/mit/media/p/presentation/base/desktop-min-341885.css&kx/ucs/uh/css/291/yunivhead-min.css&kx/ucs/uh/css/221/logo-min.css&kx/ucs/homepage/css/155/homepage-ie-min.css&kx/ucs/notif_v2/css/145/notifications_v2-min.css&kx/ucs/mailcount/css/37/mail_preview-min.css&kx/ucs/search/css/190/search_all-min.css&kx/ucs/search/css/190/search_buttons-min.css&kx/ucs/breakingnews/css/12/breaking_news-min.css&os/mit/media/m/header/header-desktop-min-630857.css&os/mit/media/m/navigation/navigation-desktop-min-603998.css&os/mit/media/m/linkbox/linkbox-min-248956.css&os/mit/media/m/ads/ads-min-892923.css&os/mit/media/m/heading/heading-min-214964.css&os/gm/m/footer/footer_sponsor-min-188629.css&os/gm/m/footer/footer_links-min-188629.css&os/mit/media/m/trending/trending-min-150139.css&os/gm/m/footer/footer_info-min-323669.css&os/gm/m/footer/footer_info-desktop-min-944911.css' [20:3996] Ignoring the following declarations in this rule.
sty 29, 2013 11:54:03 AM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
我找不到让我点击链接的方法(XPath) JSoup适用于解析Web,但在页面之间动态转换是不错的。
我需要你的帮助:)我不知道我能否获得与其他解析器相同的结果而不是selenium
答案 0 :(得分:0)
访问网站上的链接不是Jsoup的问题:
示例:的
Document doc = Jsoup.connect("http://first.com/").get(); // Connect to 'root' link
Elements links = doc.select("a[href]"); // Select all Links from the website
// As an example connect to the first link of the website and parse it's html:
doc = Jsoup.connect(links.first().absUrl("href")).get();
// Continue with the new website
另请参阅:Using Jsoup, how can I fetch each and every information resides in each link?