我是网络抓取的新手,我正在尝试这段代码
import requests
import bs4
from bs4 import BeautifulSoup
import pandas as pd
import time
page = requests.get("https://leeweebrothers.com/our-food/lunch-boxes/#")
soup = BeautifulSoup(page.text, "html.parser")
names = soup.find_all('h2') #name of food
rest = soup.find_all('span', {'class' : 'amount'}) # price of food
for div, a in zip(names, rest):
print(div.text, a.text) # print name / price in same line
除了我将在下面的链接中显示的一个问题之外,它的效果很好
printing result of 2 for loops in same line
除了字符串" HONEY GLAZED CHICKEN WING"是0.00美元,这是由于网站上的购物车应用程序返回的异常值(它共享跨度类='金额')。
我如何删除此字符串并且"向上移动"其他价格使他们现在排成一行并与食物的名称相对应
编辑:
下面的示例输出 Line1: HONEY GLAZED CHICKEN WING $0.00
Line2: CRISPY CHICKEN LUNCH BOX
Line3: $5.00
Line4: BREADED FISH LUNCH BOX
Line5: $5.00
我想要的输出类似于:
Line1: HONEY GLAZED CHICKEN WING $5.00
Line2: CRISPY CHICKEN LUNCH BOX $5.00
我正在寻找一种解决方案,可以消除0.00的外围价格,并将其余的价格提高
答案 0 :(得分:1)
我想你可能会问错了问题。您可以消除0.00美元的异常值,但您的价格结果仍然与名称不符。
为了确保您的价格和名称列表的顺序相同,以便它们匹配,可能更容易首先搜索包含它们的div:
import requests
import bs4
from bs4 import BeautifulSoup
import pandas as pd
import time
page = requests.get("https://leeweebrothers.com/our-food/lunch-boxes/#")
soup = BeautifulSoup(page.text, "html.parser")
# all the divs that held the foods had this same style
divs = soup.find_all('div', {'style': 'max-height:580px;'})
names_and_prices = {
# name: price
div.find('h2').text: div.find('span', {'class': 'amount'}).text
for div in divs
}
for name, price in names_and_prices.items():
print(name, price)
答案 1 :(得分:1)
要按照上面提到的方式获得输出,您可以尝试如下:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://leeweebrothers.com/our-food/lunch-boxes/#")
soup = BeautifulSoup(page.text, "html.parser")
for items in soup.find_all(class_='product-cat-lunch-boxes'):
name = items.find("h2").get_text(strip=True)
price = items.find(class_="amount").get_text(strip=True)
print(name,price)
结果如下:
HONEY GLAZED CHICKEN WING LUNCH BOX $5.00
CRISPY CHICKEN LUNCH BOX $4.50
BREADED FISH LUNCH BOX $4.50
EGG OMELETTE LUNCH BOX $4.50
FRIED TWO-JOINT WING LUNCH BOX $4.50
答案 2 :(得分:0)
试试这个:
for div, a in zip(names, rest):
if a.text.strip() and '$0.00' not in a.text: # empty strings are False
print(div.text, a.text) # print name / price in same line
else: # optional
print 'Outlier' # optional
请注意,这仅适用于a.text
中包含“$ 0.00”的异常值。