Scrapy分页

时间:2018-04-04 03:08:51

标签: python pagination scrapy

我面临着scrapy分页的问题。

这是html:

<a href="" onclick="return false;" class="archive_page_info" 
id="next_achive_button" data-number_page_click="2">NEXT</a>

Scrapy python方法:

#follow pagination links
next_page_url =   response.css("#next_achive_button").extract_first()
if next_page_url:
   next_page_url = response
   yield scrapy.Request(url=next_page_url, callback=self.parse)

我需要一些帮助来解决这个问题,当我点击下一个按钮时,它应该转到下一页。但是,我看到下一个href在onclick="return false;"上我不知道如何解决这个问题。能否请您提供一些如何解决上述问题的提示。感谢。

1 个答案:

答案 0 :(得分:0)

如果您有Mozilla,请了解如何在Chrome或Firebug中使用Inspect

单击Preserve Logs,然后单击下一页按钮,您将看到此AJAX POST被触发。

import requests

cookies = {
    '__unam': '7639673-16295793afa-1ab158d0-2',
    '__utma': '56229998.2107893981.1522926175.1522926175.1522926175.1',
    '__utmc': '56229998',
    '__utmz': '56229998.1522926175.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',
    '__utmt': '1',
    '__utmb': '56229998.1.10.1522926175',
    '_first_pageview': '1',
    '__qca': 'P0-1006667184-1522926176233',
    '_jsuid': '585270328',
    'no_trackyy_100969001': '1',
    '__atuvc': '1%7C14',
    '__atuvs': '5ac6025f8fcb6eab000',
    '__tbc': '%7Bjzx%7DIafCBS3b0wpS60-QMtzjGoXcgB2LuqBv13vshDxFKXzUXsJfILJAOyJBA8fT0NrLuAw9JkikXT-lxGWsIpDKlbAJG-Kkoz0pLPzCOLd06VAHO90uO2kuCkU83cHKD7GRaOuzBb9gsuOCm70ShIsd5Q',
    '__pat': '-14400000',
    '__pvi': '%7B%22id%22%3A%22v-2018-04-05-16-02-58-224-d9oQ6Ns4C5cJ79uD-02aeb22c0032f00f6131c0dfebc6b934%22%2C%22domain%22%3A%22.therealdeal.com%22%2C%22time%22%3A1522926179784%7D',
    'xbc': '%7Bjzx%7DPVPoYpACRK8IQh-L66G6Lf11La8U3KDJG42A358oKni-AhQB0dxnTTq_CM95WKsZWHv9fY5JWLkSs5KImxmuRbiETxj07xc3lSSyb53w6bNyQuiiqqE20nVKEniUHDvl9zcfaHGMtBfOKaRmlxOx3TnX34PCjdEudjMUtEx_n9gwp4UEWknk1qUZNvvp7TLK-U4hyrWfMZZezw6MVfaRX5CZGW7Wg6zJ565EiqML9pJ9aeCUAUzgoy7pLjGXLxxtCBVOpfzQAi2b_SJnf2-Pe3KNCXlNvZ7Tr1GylPSVBkP1SYwS237iji2rMBo1YoeZ',
    '_eventqueue': '%7B%22heatmap%22%3A%5B%7B%22type%22%3A%22heatmap%22%2C%22href%22%3A%22%2Fnew-research%2Ftopics%2Fpeople%2F%22%2C%22x%22%3A795%2C%22y%22%3A3283%2C%22w%22%3A1366%7D%2C%7B%22type%22%3A%22heatmap%22%2C%22href%22%3A%22%2Fnew-research%2Ftopics%2Fpeople%2F%22%2C%22x%22%3A800%2C%22y%22%3A3268%2C%22w%22%3A1366%7D%5D%2C%22events%22%3A%5B%5D%7D',
}

headers = {
    'Origin': 'https://therealdeal.com',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'Accept': 'text/html, */*; q=0.01',
    'Referer': 'https://therealdeal.com/new-research/topics/people/',
    'X-Requested-With': 'XMLHttpRequest',
    'Connection': 'keep-alive',
    'DNT': '1',
}

data = [
  ('action', 'display_filtered_archives_of_trd_topics'),
  ('filtered_type', 'People'),
  ('number_of_click_page', '3'),
]

response = requests.post('https://therealdeal.com/wp-admin/admin-ajax.php', headers=headers, cookies=cookies, data=data)