我以前经常从网站上抓取标题,但是这次我做不到,也不知道为什么。 在我的代码下方
在此处输入代码
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
import pandas as pd
import ssl
from time import sleep
from random import randint
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
html = urlopen("https://officialblackwallstreet.com/directory/")
bsObj = soup(html.read())
bws_titles_bags = []
bws_names = bsObj.findAll(["a","title data-original-title"])
结果
enter code here
,
How can retrieve the title " McClean Photography" for example and the other ones ?
Thank you for your help
[titles][1]
[1]: https://i.stack.imgur.com/aJ8oy.png
答案 0 :(得分:0)
数据是通过Ajax从其他URL动态加载的。您可以使用此示例获取标题:
import json
import requests
from bs4 import BeautifulSoup
api_url = 'https://officialblackwallstreet.com/wp-admin/admin-ajax.php'
params = {
'post_type': 'item',
'type': 2,
'page': 1,
'ppp': 9,
'action': 'post_list',
'order': 'DESC',
'orderby': 'date',
'keyword': ''
}
data = requests.post(api_url, data=params).json()
#uncomment this to print all data:
#print(json.dumps(data, indent=4))
for m in data['markers']:
print(BeautifulSoup(m['info']['post_title'], 'html.parser').text)
打印:
McClean Photography
Zmena INC.
Hippie Adjacent
YourAdminOnline.com
Don’t Sweat The Technique
Asanee Coaching Services
Joy Street Design
Natural Ash Body