我编写这段代码是为了从 Flipkart 的手机类别中抓取数据。我面临的问题是当元素存在时出现属性错误(“AttributeError: 'NoneType' object has no attribute 'text'”)。如何修改此代码才能工作。如果存在元素,我需要将数据填充为“不可用。请参阅下面的代码。我是编程初学者,将不胜感激。
'''
导入请求
从 bs4 导入 BeautifulSoup
导入 csv
重新导入
base_url = "https://www.flipkart.com/search?q=mobiles&page="
def get_urls(): with open("fliplart-data.csv", "a") as csv_file:
writer = csv.writer(csv_file)
writer.writerow(
['Product_name', 'Price', 'Rating', 'Product-url'])
for page in range(1, 510):
page = base_url + str(page)
response = requests.get(page).text
soup = BeautifulSoup(response, 'lxml')
for product_urls in soup.find_all('a', href=True, attrs={'class': '_1fQZEK'}):
name = product_urls.find('div', attrs={'class': '_4rR01T'}).text
price = product_urls.find('div', attrs={'class': '_30jeq3 _1_WHN1'}).text
price = re.split("\₹", price)
price = price[-1]
rating = product_urls.find('div', attrs={'class': '_3LWZlK'}).text
item_url = soup.find('a', class_="_1fQZEK", target="_blank")['href']
item_url = " https://www.flipkart.com" + item_url
item_url = re.split("\&", item_url)
item_url = item_url[0]
print(f'Product name is {name}')
print(f'Product price is {price}')
print(f'Product rating is {rating}')
print(f'Product url is {item_url}')
writer.writerow(
[name, price, rating, item_url])
get_urls()
'''
答案 0 :(得分:0)
看起来您可能试图用 try/catch
异常处理包围字符串,如果有这样的 AttributeError,并使用 except
块将字符串设置为“不可用”时有一个例外。
import requests
from bs4 import BeautifulSoup
import csv
import re
base_url = "https://www.flipkart.com/search?q=mobiles&page="
def get_urls():
csv_file = open("fliplart-data.csv", "a")
writer = csv.writer(csv_file)
writer.writerow(
['Product_name', 'Price', 'Rating', 'Product-url'])
for page in range(1, 510):
page = base_url + str(page)
response = requests.get(page).text
soup = BeautifulSoup(response, 'lxml')
for product_urls in soup.find_all('a', href=True, attrs={'class': '_1fQZEK'}):
#name
try:
name = product_urls.find('div', attrs={'class': '_4rR01T'}).text
except Exception as e:
name = "Not Available"
#price
try:
price = product_urls.find('div', attrs={'class': '_30jeq3 _1_WHN1'}).text
price = re.split("\₹", price)
price = price[-1]
except Exception as e:
price = "Not Available"
#rating
try:
rating = product_urls.find('div', attrs={'class': '_3LWZlK'}).text
except Exception as e:
rating = "Not Available"
#item_url
try:
item_url = soup.find('a', class_="_1fQZEK", target="_blank")['href']
item_url = " https://www.flipkart.com" + item_url
item_url = re.split("\&", item_url)
item_url = item_url[0]
except Exception as e:
item_url = "Not Available"
print(f'Product name is {name}')
print(f'Product price is {price}')
print(f'Product rating is {rating}')
print(f'Product url is {item_url}')
writer.writerow(
[name, price, rating, item_url])
get_urls()
输出
Product name is intaek 5616
Product price is 789
Product rating is Not Available
Product url is https://www.flipkart.com/kxd-m1/p/itm89bbc238d6356?pid=MOBFUXKG3DYVZRQV
从您抓取的结果来看,实际数据与它所说的网址不匹配。这可能也是您遇到的问题的一部分。