如何进行网页抓取-beautifulSoup

时间:2019-08-26 10:11:49

标签: python beautifulsoup

我正在尝试从此链接中获取每个产品的标题和价格的列表-https://www.price.ro/preturi_notebook-1193.htm,但无法将这两个列表合并为一个列表,如下所示:

“标题价格”

我在代码中做了一些操作,但是我陷入了合并这两列的问题

import requests
from bs4 import BeautifulSoup

url_link = 'https://www.price.ro/preturi_notebook-1193.htm'
page = requests.get(url_link)
soup = BeautifulSoup(page.content, 'html.parser')

title=soup.findAll('a',{'class':"titlu"})
price=soup.findAll('a',{'class':"price"})

for t in title:
    print(t.text.strip())
for p in price:
    print(p.text.strip())`

预期输出:

Asus ZenBook UX430UA-GV340R                        3,579.00 lei
Asus ZenBook  ux331fal-eg006t                      3,298.99 lei
Asus UX334FL-A4005T                                8,403.98 lei
Asus UX461FA-E1035T                                3,292.95 lei
Lenovo IdeaPad S530-13IWL  81j7004grm              3,499.00 lei
Asus ZenBook 13  UX331FN-EG003T                    5,229.00 lei
Asus UX334FL-A4014R                                3,692.28 lei
Asus FX705GM-EW137                                 4,460.96 lei
Asus S330FA-EY095                                  4,174.00 lei
Asus UX333FA-A4109                                 5,794.00 lei

4 个答案:

答案 0 :(得分:1)

produs-lista查找所有product列表和产品的迭代列表以及每种产品的报废标题和价格。

例如。

import requests
from bs4 import BeautifulSoup

url_link = 'https://www.price.ro/preturi_notebook-1193.htm'
page = requests.get(url_link)
soup = BeautifulSoup(page.content, 'html.parser')
produs_list = soup.find("div",{'class':'produse'}).find_all("div",\
               {'class':'produs-lista'})
data = []
for x in produs_list:
    title = x.find("a",{'class':'titlu'}).text.strip()
    price = x.find("a",{'class':'price'}).text.strip()
    product = dict(title=title,price=price)
    data.append(product)

print(data)

O / P:

[{'title': 'Asus ZenBook UX430UA-GV340R', 'price': '3,292.95 lei'}, 
{'title': 'Asus ZenBook  ux331fal-eg006t', 'price': '3,499.00 lei'}, 
{'title': 'Asus UX334FL-A4005T', 'price': '5,229.00 lei'}, 
{'title': 'Asus UX461FA-E1035T', 'price': '3,692.28 lei'}, 
{'title': 'Lenovo IdeaPad S530-13IWL  81j7004grm', 'price': '4,460.96 lei'}, 
{'title': 'Asus ZenBook 13  UX331FN-EG003T', 'price': '4,174.00 lei'}, 
{'title': 'Asus UX334FL-A4014R', 'price': '5,794.00 lei'}, 
{'title': 'Asus FX705GM-EW137', 'price': '5,885.48 lei'}, 
{'title': 'Asus S330FA-EY095', 'price': '3,279.46 lei'}, 
{'title': 'Asus UX333FA-A4109', 'price': '4,098.99 lei'}, 
{'title': 'Apple The New MacBook Pro 13 Retina (mpxr2ze/a)', 'price': '6,040.67 lei'}, 
{'title': 'Lenovo Legion Y530 81FV003MRM', 'price': '3,098.99 lei'}, 
{'title': 'Asus UX433FA-A5046R', 'price': '3,699.00 lei'}, 
{'title': 'HP ProBook 450 G6 5TL51EA', 'price': '3,299.99 lei'},
 {'title': 'Asus X542UA-DM525', 'price': '2,424.00 lei'}, 
{'title': 'Lenovo ThinkPad X1 Carbon 6th gen 20KH006JRI', 'price': '10,202.99 lei'}, 
{'title': 'Asus VivoBook  X540UA-DM972', 'price': '1,659.00 lei'}, 
{'title': 'Asus X507UA-EJ782', 'price': '2,189.00 lei'}, 
{'title': 'Apple MacBook Air 13 (mqd32ze/a)', 'price': '3,998.00 lei'}, 
{'title': 'HP ProBook 470 G5  2rr84ea', 'price': '4,460.49 lei'}]

答案 1 :(得分:0)

您可以压缩标题和价格,

for x in zip(title,price):
print(x[0].text.strip(),x[1].text.strip())

这将是输出:

华硕ZenBook UX430UA-GV340R 7,998.99林雷 华硕ZenBook ux331fal-eg006t 7,650.85林雷 华硕UX334FL-A4005T 3,598.99林雷 华硕UX461FA-E1035T 3,292.95林雷 联想IdeaPad S530-13IWL 81j7004grm 3,499.00雷 华硕ZenBook 13 UX331FN-EG003T 5,229.00林雷 华硕UX334FL-A4014R 3,692.28林雷 华硕FX705GM-EW137 4,460.96林雷 华硕S330FA-EY095 4,174.00雷 华硕UX333FA-A4109 5,794.00林雷 苹果新款MacBook Pro 13 Retina(mpxr2ze / a)5,885.48林雷 联想军团Y530 81FV003MRM 3,279.46林雷 华硕UX433FA-A5046R 4,098.99林雷 惠普ProBook 450 G6 5TL51EA 6,040.67雷 华硕X542UA-DM525 3,098.99林雷 联想ThinkPad X1 Carbon 6th gen 20KH006JRI 3,699.00 lei 华硕VivoBook X540UA-DM972 3,299.99雷 华硕X507UA-EJ782 2,424.00雷 苹果MacBook Air 13(MQD32ZE / A)10,202.99林雷 惠普ProBook 470 G5 2rr84ea 1,659.00 lei

答案 2 :(得分:0)

您应该为此更改for:

for index in range(len(title)):
    print("{}    {}".format(title[index].text.strip()))

您可以这样做,因为价格和标题的数量相同。

让我知道它是否有效!

答案 3 :(得分:0)

您可以使用zip函数将列表连接起来,并随时随地生成新列表(使用列表理解功能):

import requests
from bs4 import BeautifulSoup

url_link = 'https://www.price.ro/preturi_notebook-1193.htm'
page = requests.get(url_link)
soup = BeautifulSoup(page.content, 'html.parser')

title=list(soup.findAll('a',{'class':"titlu"}))
price=list(soup.findAll('a',{'class':"price"}))


#Merge lists using the zip() function, generate a new list of tuples with a list compehension
zippedlist = [(ttl, prce) for ttl, prcein zip(title,price)] 
#print(zippedlist)