我需要一些帮助来尝试使用 BeautifulSoup、Selenium 和 Pandas 从 Flipkart 网络抓取笔记本电脑的价格、评级和产品到 CSV 文件。问题是当我尝试将抓取的项目附加到空列表中时,出现错误 AttributeError: 'NoneType' object has no attribute 'text'。
from selenium import webdriver
import pandas as pd
from bs4 import BeautifulSoup
chrome_option = webdriver.ChromeOptions()
driver = webdriver.Chrome(executable_path = "C:/Users/folder/PycharmProjects/chromedriver.exe")
#flipkart website
driver.get("https://www.flipkart.com/laptops/~cs-g5q3mw47a4/pr?sid=6bo%2Cb5g&collection-tab-name=Browsing&wid=13.productCard.PMU_V2_7")
products = []
prices = []
ratings = []
content = driver.page_source
soup = BeautifulSoup(content, 'lxml')
for item in soup.findAll('a', href = True, attrs={'class' : '_1fQZEK'}):
name = item.find('div', attrs={'class' : '_4rR01T'})
price = item.find('div', attrs={'class' : '_30jeq3 _1_WHN1'})
rating = item.find('div', attrs={'class' : '_3LWZlK'})
products.append(name.text)
prices.append(price.text)
ratings.append(rating.text)
df = pd.DataFrame({'Product Name': products,
'Price': prices,
'Rating': ratings})
df.to_csv(r"C:\Users\folder\Desktop\webscrape.csv", index=True, encoding= 'utf-8')
答案 0 :(得分:0)
您应该使用 JDA
或 .contents
代替 .get_text()
。另外,尽量关心 NoneType :
.text
答案 1 :(得分:0)
找到了解决方案!将 .text 替换为 .get_text() 后,该错误得到解决。另外避免另一个错误ValueError: arrays must all be same length的方法是打印(len()) 来确认附加数据的长度是否传入熊猫数据框。
在这种情况下,在 for 循环的所有迭代中发现 ratings 变量的 len() 为 0,因此它们不包含在数据帧 df 中。修改后的代码如下:
#--snip--
#empty list to be appended later with webscraped items
products = []
prices = []
ratings = []
for item in soup.findAll('a', href = True, attrs={'class' : '_1fQZEK'}):
name = item.find('div', attrs={'class' : '_4rR01T'})
price = item.find('div', attrs={'class' : '_30jeq3 _1_WHN1'})
rating = item.find('div', attrs={'class' : '_3LWZlK'})
#append the info to the empty lists
products.append(name.get_text()) if name else ''
prices.append(price.get_text()) if price else ''
#creating pandas DataFrame
print(f"Products: {len(products)}")
print(f"Prices: {len(prices)}")
print(f"Ratings: {len(ratings)}")
df = pd.DataFrame({'Product Name': products,
'Price': prices})
#sending the dataframe to csv
df.to_csv(r"C:\Users\folder\Desktop\samplescrape.csv", index=True, encoding= 'utf-8')