Question

我需要一些帮助来尝试使用 BeautifulSoup、Selenium 和 Pandas 从 Flipkart 网络抓取笔记本电脑的价格、评级和产品到 CSV 文件。问题是当我尝试将抓取的项目附加到空列表中时，出现错误 AttributeError: 'NoneType' object has no attribute 'text'。

from selenium import webdriver
import pandas as pd
from bs4 import BeautifulSoup


chrome_option = webdriver.ChromeOptions()
driver = webdriver.Chrome(executable_path = "C:/Users/folder/PycharmProjects/chromedriver.exe")
#flipkart website
driver.get("https://www.flipkart.com/laptops/~cs-g5q3mw47a4/pr?sid=6bo%2Cb5g&collection-tab-name=Browsing&wid=13.productCard.PMU_V2_7")


products = []
prices = []
ratings = []


content = driver.page_source
soup = BeautifulSoup(content, 'lxml')
for item in soup.findAll('a', href = True, attrs={'class' : '_1fQZEK'}):
    name = item.find('div', attrs={'class' : '_4rR01T'})
    price = item.find('div', attrs={'class' : '_30jeq3 _1_WHN1'})
    rating = item.find('div', attrs={'class' : '_3LWZlK'})
    
    products.append(name.text)
    prices.append(price.text)
    ratings.append(rating.text)
    

    df = pd.DataFrame({'Product Name': products,
                        'Price': prices,
                        'Rating': ratings})

    df.to_csv(r"C:\Users\folder\Desktop\webscrape.csv", index=True, encoding= 'utf-8')

Answer 1

您应该使用 JDA 或 .contents 代替 .get_text()。另外，尽量关心 NoneType :

.text

Answer 2

找到了解决方案！将 .text 替换为 .get_text() 后，该错误得到解决。另外避免另一个错误ValueError: arrays must all be same length的方法是打印(len()) 来确认附加数据的长度是否传入熊猫数据框。

在这种情况下，在 for 循环的所有迭代中发现 ratings 变量的 len() 为 0，因此它们不包含在数据帧 df 中。修改后的代码如下：

#--snip--
    
#empty list to be appended later with webscraped items
products = []
prices = []
ratings = []

for item in soup.findAll('a', href = True, attrs={'class' : '_1fQZEK'}):
    name = item.find('div', attrs={'class' : '_4rR01T'})
    price = item.find('div', attrs={'class' : '_30jeq3 _1_WHN1'})
    rating = item.find('div', attrs={'class' : '_3LWZlK'})
    #append the info to the empty lists

    products.append(name.get_text()) if name else ''
    prices.append(price.get_text()) if price else ''

    #creating pandas DataFrame
    print(f"Products: {len(products)}")
    print(f"Prices: {len(prices)}")
    print(f"Ratings: {len(ratings)}")

    df = pd.DataFrame({'Product Name': products,
                        'Price': prices})
     #sending the dataframe to csv
    df.to_csv(r"C:\Users\folder\Desktop\samplescrape.csv", index=True, encoding= 'utf-8')

Python Webscraping - AttributeError: 'NoneType' 对象没有属性 'text'

2 个答案: