Web抓取python中的打印网址问题

时间:2020-03-14 08:31:07

标签: python web-scraping beautifulsoup

我在程序中显示网址时遇到问题

print(items[0].find(class_='product-card__link-overlay'))

linki = [item.find(class_='product-card__link-overlay').get_text() for item in items]

我不知道如何更改它们以显示html链接而不是名称

例如 我要改变 0 Nike Air Max 270 React 679,99złNike Air Max 270 React 至 0 Nike Air Max 270 React 679,99złhtttps://www.nike.com/products/nikereact/id ....

代码:

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re

odpowiedz = requests.get("https://www.nike.com/pl/w?q=react%20270&vst=react%20270")
soup = BeautifulSoup(odpowiedz.text, 'html.parser')
nazwa = soup.find(id='Nike Air Max 270 React')
#print(nazwa)

items = soup.find_all(class_='product-card css-1pclthi ncss-col-sm-6 ncss-col-lg-4 va-sm-t product-grid__card')
#print(items)

#for link in soup.findAll('a', attrs={'href': re.compile("^https://")}):
    #print(link.get('href'))

#linki = soup.find('a', id='product-card css-1pclthi ncss-col-sm-6 ncss-col-lg-4 va-sm-t product-grid__card')
#print(linki)

print(items[0].find(class_='product-card__title').get_text())
print(items[0].find(class_='product-card__price').get_text())
print(items[0].find(class_='product-card__link-overlay'))
#print(items[0].find(class_='product-card__link-overlay'))




title = [item.find(class_='product-card__title').get_text() for item in items]
price = [item.find(class_='product-card__price').get_text() for item in items]
linki = [item.find(class_='product-card__link-overlay').get_text() for item in items]

wynik = pd.DataFrame(
    {
        'title': title,
        'price': price,
        'linki': linki,
    })

1 个答案:

答案 0 :(得分:0)

替换:

linki = [item.find(class_='product-card__link-overlay').get_text() for item in items]

使用方式:

linki = [f"{item.find(class_='product-card__link-overlay').get_text()} {item.find(class_='product-card__link-overlay').attrs['href']}" for item in items]