Question

代码很简单，但是我很难弄清楚选择器

import csv
import time
from bs4 import BeautifulSoup
import requests

source = requests.get('https://website.com').text

soup = BeautifulSoup(source, 'lxml')

nextpage= soup.find("a", string="3").get('href')
print (nextpage)

为我提供了与3相关的href ...，但是当我尝试"Next"时却遇到None错误

路径为：

<a class="" href="https://website.com;page=2">Next ›</a>]

我在做什么错？还有另一种选择下一个选择器的方法...

下面的代码有效

nextpage= main_pagination.find_all('a', class_='')[3]

但是该代码的问题是，下一步可能是在另一次搜索中[5]。我需要此页面的通用解决方案...

Answer 1

另一种解决方案。

from simplified_scrapy import SimplifiedDoc,utils
html = '<a class="" href="https://website.com;page=2">Next ›</a>'
doc = SimplifiedDoc(html)
nextpage = doc.getElementByReg('Next',tag='a')
print(nextpage)

结果：

{'class': '', 'href': 'https://website.com;page=2', 'tag': 'a', 'html': 'Next ›'}

还有更多示例。 https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

选择器出现问题

1 个答案: