我正在使用“加载更多”按钮来抓取网站。我需要用黄色提取带下划线的数字,但我得到0。
这里是the website。如何提取所需的信息?
这是我的代码:
import requests
from parsel import Selector
from scrapy.selector import Selector
from scrapy.http import HtmlResponse
nexturl = 'https://www.tayara.tn/sc/immobilier/appartements'
response = requests.get(nexturl)
sel = Selector(response)
nbPages = sel.xpath('//div[@class="_1Nm7X TkLPj"]/text()').extract()
print(nbPages)
答案 0 :(得分:0)
如果要获取总的“年度”计数,则需要模拟XHR。对我来说,它的工作原理如下所示,但是您可以尝试对其进行修改(删除不必要的标题,美化data
等)……
import requests
headers = {
'Origin': 'https://www.tayara.tn',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36',
'Content-Type': 'application/json',
'Accept': '*/*',
'Referer': 'https://www.tayara.tn/sc/immobilier/appartements',
'Connection': 'keep-alive',
}
data = '{"query":"query ListingsPage($page: Page, $filter: SearchFilter, $sortBy: SortOrder) {\\n listings: searchAds(page: $page, filter: $filter, sortBy: $sortBy) {\\n items {\\n uuid\\n title\\n price\\n currency\\n thumbnail\\n createdAt\\n state\\n category {\\n id\\n name\\n engName\\n __typename\\n }\\n user {\\n uuid\\n displayName\\n avatar(width: 96, height: 96) {\\n url\\n __typename\\n }\\n __typename\\n }\\n __typename\\n }\\n trackingInfo {\\n transactionId\\n listName\\n recommenderId\\n experimentId\\n variantId\\n __typename\\n }\\n totalCount\\n pageInfo {\\n startCursor\\n hasPreviousPage\\n endCursor\\n hasNextPage\\n __typename\\n }\\n __typename\\n }\\n}\\n","variables":{"page":{"count":36},"filter":{"queryString":null,"category":"2","regionId":null,"attributeFilters":[]},"sortBy":"CREATED_DESC"},"operationName":"ListingsPage"}'
response = requests.post('https://www.tayara.tn/graphql', headers=headers, data=data)
print(response.json()['data']['listings']['totalCount'])