网页抓取-标题

时间:2020-09-19 15:57:32

标签: python web-scraping

我以前经常从网站上抓取标题,但是这次我做不到,也不知道为什么。 在我的代码下方

在此处输入代码

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
import pandas as pd
import ssl
from time import sleep
from random import randint

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context


html = urlopen("https://officialblackwallstreet.com/directory/")
bsObj = soup(html.read())
bws_titles_bags = []
bws_names = bsObj.findAll(["a","title data-original-title"])

结果

enter code here

How can retrieve the title " McClean Photography" for example and the other ones ?

Thank you for your help

[titles][1]


  [1]: https://i.stack.imgur.com/aJ8oy.png

1 个答案:

答案 0 :(得分:0)

数据是通过Ajax从其他URL动态加载的。您可以使用此示例获取标题:

import json
import requests
from bs4 import BeautifulSoup


api_url = 'https://officialblackwallstreet.com/wp-admin/admin-ajax.php'

params = {
    'post_type':  'item',
    'type':    2,
    'page':    1,
    'ppp': 9,
    'action':  'post_list',
    'order':   'DESC',
    'orderby': 'date',
    'keyword': ''
}


data = requests.post(api_url, data=params).json()

#uncomment this to print all data:
#print(json.dumps(data, indent=4))

for m in data['markers']:
    print(BeautifulSoup(m['info']['post_title'], 'html.parser').text)

打印:

McClean Photography
Zmena INC.
Hippie Adjacent
YourAdminOnline.com
Don’t Sweat The Technique
Asanee Coaching Services
Joy Street Design
Natural Ash Body
相关问题