我正在尝试删除一个网站,在该网站中我尝试过的各种div标签中都包含详细信息,但是以某种方式我无法进行剪贴,因为div标签中存在每个元素,而且在div下,我也有span标签编写返回空字符串的代码
这是我的代码
unspsc_link = "https://order.besse.com/Orders/Search/ProductSearch?query=34431"
link = requests.get(unspsc_link).text
soup = BeautifulSoup(link, 'lxml')
prdItemNumbers = []
prdTitles = []
prdSubTitles = []
prdNDCs = []
prdUOM = []
prdForm = []
for row in soup.select('.row'):
prdItemNumbers = row.select_one('.font-xs bg-teal')
if prdItemNumbers is None:
prdItemNumbers.append('N/A')
else:
prdItemNumbers.append(prdItemNumbers.text.strip().replace('\u200b',''))
prdTitles = row.select_one('.header1')
if prdTitles is None:
prdTitles.append('N/A')
else:
prdTitles.append(prdTitles.text.strip())
prdSubTitles = row.select_one('.header2')
if prdSubTitles is None:
prdSubTitles.append('N/A')
else:
prdSubTitles.append(prdSubTitles.text.strip())
prdNDCs = row.select_one('.col-sm-5')
if prdNDCs is None:
prdNDCs.append('N/A')
else:
prdNDCs.append(prdNDCs.text.strip())
prdUOM = row.select_one('.col-sm-3')
if prdUOM is None:
prdUOM.append('N/A')
else:
prdUOM.append(prdUOM.text.strip())
prdForm = row.select_one('.col-sm-4')
if prdForm is None:
prdForm.append('N/A')
else:
prdForm.append(prdForm.text.strip())
它引发错误
prdItemNumbers.append('N/A')
AttributeError: 'NoneType' object has no attribute 'append'
答案 0 :(得分:1)
此
for row in soup.select('.row'):
prdItemNumbers = row.select_one('.font-xs bg-teal')
if prdItemNumbers is None:
prdItemNumbers.append('N/A')
else:
prdItemNumbers.append(prdItemNumbers.text.strip().replace('\u200b',''))
应该是
for row in soup.select('.list-group-item'):
prdItemNumber = row.select_one('.font-xs bg-teal')
if prdItemNumber is None:
prdItemNumbers.append('N/A')
else:
prdItemNumbers.append(prdItemNumber.text.strip().replace('\u200b',''))
测试应该在prdItemNumber
上进行,这是当前设置元素的尝试,而不是要添加到列表的元素。其他原则相同;并且您要使所有列表变量名都复数。此外,要循环的父类应为list-group-item
。
内容也似乎是从XHR POST请求动态加载的。您可以使用selenium加载页面,然后像以前一样使用page_source
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
d = webdriver.Chrome(r'C:\Users\HarrisQ\Documents\chromedriver.exe')
d.get('https://order.besse.com/Orders/Search/ProductSearch?query=34431')
WebDriverWait(d,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".list-group-item")))
soup = BeautifulSoup(d.page_source, 'lxml')
prdItemNumbers = []
prdTitles = []
prdSubTitles = []
prdNDCs = []
prdUOMs = []
prdForms = []
for row in soup.select('.list-group-item'):
prdItemNumber = row.select_one('.font-xs bg-teal')
if prdItemNumber is None:
prdItemNumbers.append('N/A')
else:
prdItemNumbers.append(prdItemNumber.text.strip().replace('\u200b',''))
prdTitle = row.select_one('.header1')
if prdTitle is None:
prdTitles.append('N/A')
else:
prdTitles.append(prdTitle.text.strip())
prdSubTitle = row.select_one('.header2')
if prdSubTitle is None:
prdSubTitles.append('N/A')
else:
prdSubTitles.append(prdSubTitle.text.strip())
prdNDC = row.select_one('.col-sm-5')
if prdNDC is None:
prdNDCs.append('N/A')
else:
prdNDCs.append(prdNDC.text.strip())
prdUOM = row.select_one('.col-sm-3')
if prdUOM is None:
prdUOMs.append('N/A')
else:
prdUOMs.append(prdUOM.text.strip())
prdForm = row.select_one('.col-sm-4')
if prdForm is None:
prdForms.append('N/A')
else:
prdForms.append(prdForm.text.strip())
d.quit()