使用下面的代码,并尝试在href的末尾找到值。有没有办法提取href,并在BeutifulSoup / Regex中找到page=
之后的值?
from bs4 import BeautifulSoup
import requests
import json
import re
request = requests.get('https://www.goodreads.com/quotes/tag/fun?page=1')
soup = BeautifulSoup(request.text, 'html.parser')
findNext = soup.find("a", class_="next_page")
print(findNext)
获得此输出:
<a class="next_page" href="/quotes/tag/fun?page=2" rel="next">next »</a>
注意:想要从上面或任何其他可能出现的号码中提取2
。
答案 0 :(得分:1)
您可以使用regex
查找页码:
from bs4 import BeautifulSoup
import re
request = requests.get('https://www.goodreads.com/quotes/tag/fun?page=1')
soup = BeautifulSoup(request.text, 'html.parser')
page_nums = re.findall('(?<=page\=)\d+', str(soup.find("a", class_="next_page")))[0]
输出:
2
答案 1 :(得分:1)
from bs4 import BeautifulSoup
import requests
request = requests.get('https://www.goodreads.com/quotes/tag/fun?page=1')
soup = BeautifulSoup(request.text, 'html.parser')
findNext = soup.find("a", class_="next_page").attrs['href'].split('page=')[1]
print(findNext)
#Result is 2
答案 2 :(得分:0)
使用Regex,您可以执行类似的操作,
let url = "/quotes/tag/fun?page=2";
let urlParam = url.substring(url.indexOf('?') + 1);
let matches = urlParam.match(/=(.+)/);
let username;
if (matches) {
username = matches[1];
}
return username;
答案 3 :(得分:0)
var text = '<a class="next_page" href="/quotes/tag/fun?page=2" rel="next">next »</a>';
var regex = /(?<=href=\")[^\?]+\?page=(\d+)(?=\")/
var match = regex.exec(text);
console.log("**href => " + match[0] + " **page => " + match[1]);
&#13;
答案 4 :(得分:0)
使用JavaScript,您可以使用URL
构造函数,.search
获取查询字符串参数,String.prototype.split()
字符为"="
,Array.prototype.pop()
var param = new URL('https://www.goodreads.com/quotes/tag/fun?page=1')
.search.split("=").pop();
console.log(param);