我正试图从NYTimes中搜索搜索结果。例如,我用这个
开始我的抓取过程url = "http://query.nytimes.com/search/sitesearch/?action=click&contentCollection®ion=TopBar&WT.nav=searchWidget&module=SearchSubmit&pgtype=Homepage#/%22big+data%22/30days/articles/1/allauthors/oldest/"
但是,我可以使用python下载的html没有任何搜索结果。有没有办法可以访问html,好像我在网络浏览器上打开链接一样?
如果我在网络浏览器上打开链接,下面是我可以“检查元素”的html的一部分:
<div class="searchResults" id="searchResults" style="display: none;">
<ol class="searchResultsList flush" style="display: block;">
<li class="story noThumb">
<div class="element2">
<h3>
<a href="http://www.nytimes.com/2014/07/16/technology/apple-and-ibm-in-broad-software-deal-for-businesses.html">Apple Joins With IBM on Business Software </a>
</h3>
<p class="summary">The applications, Mr. Cook said, will bring “<strong>big data</strong> analytics down to the fingertips” of Apple iPhone and iPad users in corporations. “IBM can ...</p>
<div class="storyMeta">
<span class="dateline">July 15, 2014</span> -
<span class="byline">By BRIAN X. CHEN and STEVE LOHR</span> -
<span class="section">Technology - article</span> -
<span class="printHeadline">Print Headline: "Apple Joins With IBM on Business Software"</span>
</div>
</div>
</li>
<li class="story">
理想输出将是:
<a href="http://www.nytimes.com/2014/07/16/technology/apple-and-ibm-in-broad-software-deal-for-businesses.html">Apple Joins With IBM on Business Software </a>
谢谢!
答案 0 :(得分:2)
返回搜索结果的实际请求是XHR
。用Python模拟它。
使用requests
的示例:
import requests
url = 'http://query.nytimes.com/svc/cse/v2pp/sitesearch.json'
params = {
'query': "big data",
'date_range_lower': '30daysago',
'pt': 'article',
'sort_order': 'a'
}
response = requests.get(url, params=params)
data = response.json()
for result in data['results']['results']:
print result.get('og:url')
打印:
http://www.nytimes.com/2014/07/15/upshot/politically-18-year-olds-look-a-lot-like-people-in-their-20s.html
http://www.nytimes.com/2014/07/15/business/vw-to-add-suv-production-to-chattanooga-plant.html
http://www.nytimes.com/2014/07/15/business/media/germany-1-world-cup-fever-1000.html
http://www.nytimes.com/2014/07/15/business/international/winding-road-ahead-for-us-europe-trade-talks.html
http://www.nytimes.com/2014/07/15/business/daily-stock-market-activity.html
http://www.nytimes.com/2014/07/14/business/international/airlines-step-up-investment-to-meet-passenger-growth.html
http://www.nytimes.com/2014/07/15/business/international/eurozone-industrial-production-drops.html
http://www.nytimes.com/2014/07/14/business/international/airline-passengers-weigh-in-with-online-reviews.html
http://www.nytimes.com/2014/07/16/technology/a-deluge-of-comment-on-net-rules.html
http://www.nytimes.com/2014/07/16/upshot/as-growth-in-health-care-spending-slows-asking-if-a-trend-will-last.html