Question

我对网络抓取很陌生，所以我的问题可能有点简单，但它真的困扰我很多。我想从TripAdvisor中删除一些内容，但是当我在YQL中运行以下命令时，它什么都不返回。

从html中选择*其中url =＆＃34; http：//www.tripadvisor.com/Search？q = sunny + relax＆amp; geo = 191＃＆amp; ssrc = A＆amp; o = 0.html＆＃34;

谁能告诉我为什么？我的命令有什么问题吗？

提前感谢您的帮助。

Answer 1

这是因为http://www.tripadvisor.com/robots.txt中不允许使用“/ Search”页面而YQL会在robots.txt中对此进行检查。

您可以尝试其他页面并使用XPATH选择一些节点，例如：

select * from html where xpath = '//div[@class="listing_title"]/a' and url = 'http://www.tripadvisor.com/Hotels-g45963-Las_Vegas_Nevada-Hotels.html'