我面临着scrapy分页的问题。
这是html:
<a href="" onclick="return false;" class="archive_page_info"
id="next_achive_button" data-number_page_click="2">NEXT</a>
Scrapy python方法:
#follow pagination links
next_page_url = response.css("#next_achive_button").extract_first()
if next_page_url:
next_page_url = response
yield scrapy.Request(url=next_page_url, callback=self.parse)
我需要一些帮助来解决这个问题,当我点击下一个按钮时,它应该转到下一页。但是,我看到下一个href在onclick="return false;"
上我不知道如何解决这个问题。能否请您提供一些如何解决上述问题的提示。感谢。
答案 0 :(得分:0)
如果您有Mozilla,请了解如何在Chrome或Firebug中使用Inspect
。
单击Preserve Logs
,然后单击下一页按钮,您将看到此AJAX POST被触发。
import requests
cookies = {
'__unam': '7639673-16295793afa-1ab158d0-2',
'__utma': '56229998.2107893981.1522926175.1522926175.1522926175.1',
'__utmc': '56229998',
'__utmz': '56229998.1522926175.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',
'__utmt': '1',
'__utmb': '56229998.1.10.1522926175',
'_first_pageview': '1',
'__qca': 'P0-1006667184-1522926176233',
'_jsuid': '585270328',
'no_trackyy_100969001': '1',
'__atuvc': '1%7C14',
'__atuvs': '5ac6025f8fcb6eab000',
'__tbc': '%7Bjzx%7DIafCBS3b0wpS60-QMtzjGoXcgB2LuqBv13vshDxFKXzUXsJfILJAOyJBA8fT0NrLuAw9JkikXT-lxGWsIpDKlbAJG-Kkoz0pLPzCOLd06VAHO90uO2kuCkU83cHKD7GRaOuzBb9gsuOCm70ShIsd5Q',
'__pat': '-14400000',
'__pvi': '%7B%22id%22%3A%22v-2018-04-05-16-02-58-224-d9oQ6Ns4C5cJ79uD-02aeb22c0032f00f6131c0dfebc6b934%22%2C%22domain%22%3A%22.therealdeal.com%22%2C%22time%22%3A1522926179784%7D',
'xbc': '%7Bjzx%7DPVPoYpACRK8IQh-L66G6Lf11La8U3KDJG42A358oKni-AhQB0dxnTTq_CM95WKsZWHv9fY5JWLkSs5KImxmuRbiETxj07xc3lSSyb53w6bNyQuiiqqE20nVKEniUHDvl9zcfaHGMtBfOKaRmlxOx3TnX34PCjdEudjMUtEx_n9gwp4UEWknk1qUZNvvp7TLK-U4hyrWfMZZezw6MVfaRX5CZGW7Wg6zJ565EiqML9pJ9aeCUAUzgoy7pLjGXLxxtCBVOpfzQAi2b_SJnf2-Pe3KNCXlNvZ7Tr1GylPSVBkP1SYwS237iji2rMBo1YoeZ',
'_eventqueue': '%7B%22heatmap%22%3A%5B%7B%22type%22%3A%22heatmap%22%2C%22href%22%3A%22%2Fnew-research%2Ftopics%2Fpeople%2F%22%2C%22x%22%3A795%2C%22y%22%3A3283%2C%22w%22%3A1366%7D%2C%7B%22type%22%3A%22heatmap%22%2C%22href%22%3A%22%2Fnew-research%2Ftopics%2Fpeople%2F%22%2C%22x%22%3A800%2C%22y%22%3A3268%2C%22w%22%3A1366%7D%5D%2C%22events%22%3A%5B%5D%7D',
}
headers = {
'Origin': 'https://therealdeal.com',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Accept': 'text/html, */*; q=0.01',
'Referer': 'https://therealdeal.com/new-research/topics/people/',
'X-Requested-With': 'XMLHttpRequest',
'Connection': 'keep-alive',
'DNT': '1',
}
data = [
('action', 'display_filtered_archives_of_trd_topics'),
('filtered_type', 'People'),
('number_of_click_page', '3'),
]
response = requests.post('https://therealdeal.com/wp-admin/admin-ajax.php', headers=headers, cookies=cookies, data=data)