我正在尝试解析数据以查找相同标签下的详细信息但我无法执行此操作。 我试过的剧本:
intList
我得到如下输出
import re
import pytz
import requests
import datetime
from flask import url_for
from bs4 import BeautifulSoup
from urllib.parse import urljoin
bigbash_article_link = "http://www.espncricinfo.com/ci/content/squad/1134829.html"
r = requests.get(bigbash_article_link)
bigbash_article_html = r.text
soup = BeautifulSoup(bigbash_article_html, "html.parser")
items = soup.find_all("div",{"class":"large-7 medium-7 small-7 columns"})
items1 = soup.find_all("h3")
items2 = soup.find_all("span")
bigbash_article_dict = []
for div in items:
a =div.find('img')['src']
b = 'http://www.espncricinfo.com/'
c = urljoin(b,a)
print(c)
#c[bigbash_article_dict]
#print(bigbash_article_dict)
for div in items1:
a =div.find('a').string
print(a)
for div in items2:
a =(div.find('span')).text
print(a)
我得到属性错误如果我尝试解析span标记内的细节。有没有办法在一个字典列表中提取所有已解析的详细信息 我想要的输出
http://www.espncricinfo.com/inline/content/image/1099912.html?alt=icon
http://www.espncricinfo.com/inline/content/image/751925.html?alt=icon
http://www.espncricinfo.com/inline/content/image/599004.html?alt=icon
http://www.espncricinfo.com/inline/content/image/549144.html?alt=icon
http://www.espncricinfo.com/inline/content/image/986769.html?alt=icon
http://www.espncricinfo.com/inline/content/image/1099468.html?alt=icon
http://www.espncricinfo.com/inline/content/image/1100136.html?alt=icon
http://www.espncricinfo.com/inline/content/image/1100133.html?alt=icon
http://www.espncricinfo.com/inline/content/image/721225.html?alt=icon
http://www.espncricinfo.com/inline/content/image/818215.html?alt=icon
http://www.espncricinfo.com/inline/content/image/443920.html?alt=icon
http://www.espncricinfo.com/inline/content/image/1080507.html?alt=icon
http://www.espncricinfo.com/inline/content/image/986785.html?alt=icon
http://www.espncricinfo.com/inline/content/image/517833.html?alt=icon
http://www.espncricinfo.com/inline/content/image/1099482.html?alt=icon
http://www.espncricinfo.com/inline/content/image/708777.html?alt=icon
http://www.espncricinfo.com/inline/content/image/1093893.html?alt=icon
http://www.espncricinfo.com/inline/content/image/818165.html?alt=icon
http://www.espncricinfo.com/inline/content/image/1099914.html?alt=icon
Virat Kohli
Moeen Ali
Murugan Ashwin
Yuzvendra Chahal
Aniket Choudhary
Nathan Coulter-Nile
Colin de Grandhomme
Quinton de Kock
Pavan Deshpande
AB de Villiers
Aniruddha Joshi
Sarfaraz Khan
Kulwant Khejroliya
Brendon McCullum
Mandeep Singh
Mohammed Siraj
Pawan Negi
Parthiv Patel
Navdeep Saini
Tim Southee
Manan Vohra
Washington Sundar
Chris Woakes
Umesh Yadav
Traceback (most recent call last):
File "qwe.py", line 41, in <module>
a =(div.find('span')).text
AttributeError: 'NoneType' object has no attribute 'text'
答案 0 :(得分:3)
尝试以下方法。我正在迭代li标签:
details = soup.find("div",{"class":"large-20 medium-20 small-20 columns"})
list = details.find_all('li')
bigbash_article_dict = {}
for div in list:
image_div = div.find("div", {"class": "large-7 medium-7 small-7 columns"})
image_present = False
image_sub_path = "http://www.espncricinfo.com/dummyImage"
if image_div is not None:
image_sub_path = image_div.find('img')['src']
image_present = True
domain = 'http://www.espncricinfo.com/'
image_path = urljoin(domain,image_sub_path)
bigbash_article_dict['image'] = image_path
if image_present:
details_div = div.find("div",{"class":"large-13 medium-13 small-13 columns"})
else: details_div = div.find("div",{"class":"large-13 medium-13 small-20 columns"})
name = details_div.find('a').text.strip()
bigbash_article_dict['name'] = name
for span in details_div.find_all('span'):
info = span.text
if ':' not in info:
key = "Role"
value = info
else:
key = info.split(':')[0]
value = info.split(':')[1]
bigbash_article_dict[key] = value
print(bigbash_article_dict)