为什么我尝试抓取文本元素的值返回零?

时间:2019-01-14 14:42:04

标签: python-3.x web-scraping xmlhttprequest

我正在使用“加载更多”按钮来抓取网站。我需要用黄色提取带下划线的数字,但我得到0。

enter image description here

这里是the website。如何提取所需的信息?

这是我的代码:

import requests
from parsel import Selector
from scrapy.selector import Selector 
from scrapy.http import HtmlResponse
nexturl = 'https://www.tayara.tn/sc/immobilier/appartements'
response = requests.get(nexturl)
sel = Selector(response)
nbPages = sel.xpath('//div[@class="_1Nm7X TkLPj"]/text()').extract() 
print(nbPages)

1 个答案:

答案 0 :(得分:0)

如果要获取总的“年度”计数,则需要模拟XHR。对我来说,它的工作原理如下所示,但是您可以尝试对其进行修改(删除不必要的标题,美化data等)……

import requests

headers = {
    'Origin': 'https://www.tayara.tn',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36',
    'Content-Type': 'application/json',
    'Accept': '*/*',
    'Referer': 'https://www.tayara.tn/sc/immobilier/appartements',
    'Connection': 'keep-alive',
}

data = '{"query":"query ListingsPage($page: Page, $filter: SearchFilter, $sortBy: SortOrder) {\\n  listings: searchAds(page: $page, filter: $filter, sortBy: $sortBy) {\\n    items {\\n      uuid\\n      title\\n      price\\n      currency\\n      thumbnail\\n      createdAt\\n      state\\n      category {\\n        id\\n        name\\n        engName\\n        __typename\\n      }\\n      user {\\n        uuid\\n        displayName\\n        avatar(width: 96, height: 96) {\\n          url\\n          __typename\\n        }\\n        __typename\\n      }\\n      __typename\\n    }\\n    trackingInfo {\\n      transactionId\\n      listName\\n      recommenderId\\n      experimentId\\n      variantId\\n      __typename\\n    }\\n    totalCount\\n    pageInfo {\\n      startCursor\\n      hasPreviousPage\\n      endCursor\\n      hasNextPage\\n      __typename\\n      }\\n    __typename\\n  }\\n}\\n","variables":{"page":{"count":36},"filter":{"queryString":null,"category":"2","regionId":null,"attributeFilters":[]},"sortBy":"CREATED_DESC"},"operationName":"ListingsPage"}'

response = requests.post('https://www.tayara.tn/graphql', headers=headers, data=data)

print(response.json()['data']['listings']['totalCount'])