beautifulsoup如果找不到元素,如何有意地添加不返回任何内容

时间:2019-07-03 07:25:02

标签: dataframe web-scraping beautifulsoup

如果找不到元素,如何有目的地添加[none]?我有一个元素,有时会存在,有时不会。 (LINK HERE

df中的当前输出以下:

name                       tag
ZX Torsion Releasing       Soon
Campus                     Restock
Campus                     Restock
Consortium Runner Mid 4D   Sold out
Ozweego                    Sold out
Ozweego                    Sold out
Yeezy Boost 350 V2 Infant  Sold out
Yeezy Boost 350 V2 Kids    Sold out
Yeezy Boost 350 V2         Sold out
Yung-1                     Sold out
Yung 1                     Sold out
A.R. Trainer               Sold out
A.R. Trainer               Sold out

所需的输出

name                      tag
ZX Torsion Releasing      Soon
Campus                    Restock
Campus                    Restock
Consortium Runner Mid 4D  null
Ozweego                   null
Ozweego                   null
Yeezy Boost 350 V2 Infant Sold out
Yeezy Boost 350 V2 Kids   Sold out
Yeezy Boost 350 V2        Sold out
Yung-1                    null
Yung 1                    null
A.R. Trainer              null
A.R. Trainer              null
....and so on

工作代码:

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

urls = [
    'https://www.nakedcph.com/sneakers-by-adidas/s/37'
] 

baseURL = 'https://www.nakedcph.com'
final = []
with requests.Session() as s:
    for url in urls:
        driver = webdriver.Chrome('/Users/Documents/python/Selenium/bin/chromedriver')
        driver.get(url)
        soup = bs(driver.page_source, 'lxml')
        items  = soup.findAll("div", {"class" : lambda L: L and L.startswith('col-6 col-md-3 mb-5')})
        name = [item.find('span',{'class':'product-name d-block'}).text.strip() for item in items]
        tag = [item.find('svg').next_sibling.strip() for item in soup.findAll('div',{'class':'card-ribbon'})]
        results = list(zip(name,tag))
        df = pd.DataFrame(results)

driver.quit()
df

1 个答案:

答案 0 :(得分:1)

您可以使用try except。我从未将其纳入列表理解中,我可能会尝试回去做:

import requests
import pandas as pd
from selenium import webdriver

urls = [
    'https://www.nakedcph.com/sneakers-by-adidas/s/37'
] 

baseURL = 'https://www.nakedcph.com'
final = []
with requests.Session() as s:
    for url in urls:
        driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
        driver.get(url)
        soup = bs(driver.page_source, 'lxml')
        items  = soup.findAll("div", {"class" : lambda L: L and L.startswith('col-6 col-md-3 mb-5')})

        name = []
        tag = []
        for each in items:
            name.append(each.find('span',{'class':'product-name d-block'}).text.strip())
            try:
                tag.append(each.find('svg').next_sibling.strip())
            except:
                tag.append(None)
        results = list(zip(name,tag))
        df = pd.DataFrame(results)

driver.quit()

输出:

print (df)
                            0               1
0                  ZX Torsion  Releasing Soon
1                      Campus         Restock
2                      Campus         Restock
3    Consortium Runner Mid 4D            None
4                     Ozweego            None
5                     Ozweego            None
6   Yeezy Boost 350 V2 Infant        Sold out
7     Yeezy Boost 350 V2 Kids        Sold out
8          Yeezy Boost 350 V2        Sold out
9                      Yung-1            None
10                     Yung 1            None
11               A.R. Trainer            None
12               A.R. Trainer            None
13             Adilette Pride            None
14                 Supercourt            None
15              Supercourt RX            None
16                 ZX 4000 4D            None
17         Yeezy Boost 700 V2        Sold out
18  Yeezy Boost 350 V2 Infant        Sold out
19    Yeezy Boost 350 V2 Kids        Sold out
20         Yeezy Boost 350 V2        Sold out
21         Yeezy Boost 700 V2        Sold out
22    Yeezy Boost 700 V2 Kids        Sold out
23  Yeezy Boost 700 V2 Infant        Sold out