Question

我需要从几个片段中提取文本（在给定的情况下为“325”和“550”）。我如何使用python 3.6.0，bs4，urllib进行操作。我将把获得的数据添加到csv文件中。

<div class="a-row a-spacing-none">
    <a class="a-link-normal a-text-normal" href="https://www.amazon.in/Game-Thrones-Song-Ice-Fire/dp/0007428545">
        <span class="a-size-small a-color-secondary">
        </span>

        <span class="a-size-base a-color-price s-price a-text-bold">

            <span class="currencyINR">  
            </span>
        325
        </span>

    </a>
    <span class="a-letter-space">
    </span>

    <span aria-label='Suggested Retail Price: &lt;span class="currencyINR"&gt;&amp;nbsp;&amp;nbsp;&lt;/span&gt;550' class="a-size-small a-color-secondary a-text-strike">
        <span class="currencyINR"> 
        </span>
    550
    </span>

 </div>

我已尝试使用以下代码，但随后无法删除随附的span标记：

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup


my_url = 'https://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=a+song+of+ice+and+fire'
# opening up connection, grabbing thr page

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()


# html parsing
page_soup = soup(page_html, "html.parser")


# grabs each product
containers = page_soup.findAll("div", {"class":"s-item-container"})
contain = containers[0]
price = contain.findAll("span", {"class":"a-size-base a-color-price s-price a-text-bold"})
current_price = price[0].text.strip()

Answer 1

对于初学者，您可以选择所有span类currencyINR元素。

currency = contain.find('span', attrs={"class":"currencyINR"})

price = currency.nextSibling.strip()

Answer 2

我后来解决了这个问题。显然导航并不像我截获的那么困难。然而，这是工作解决方案。

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup


my_url = "https://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=a+song+of+ice+and+fire"


# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()


# html parsing
page_soup = soup(page_html, "html.parser")


# grabs each product
containers = page_soup.findAll("div", {"class":"s-item-container"})


# Creates New File:
fileName = "H:\WEBSCRAPER\Result\Products.csv"
headers = "Product Name, Current Price, Original Price\n"

f = open(fileName, "w")
f.write(headers)


errorMsg = "Error! Not Found"
# obtains the data
for contain in containers:
    try:
        title = contain.h2.text
    except IndexError:
        title =  errorMsg
    try:
        priceCurrent = contain.findAll("span", {"class":"a-size-base a-color-price s-price a-text-bold"})
        CurrentSP = priceCurrent[0].text.strip()
    except IndexError:
        CurrentSP =  errorMsg
    try:
        priceSuggested = contain.findAll("span", {"class":"a-size-small a-color-secondary a-text-strike"})
        SuggestedSP = priceSuggested[0].text.strip()
    except IndexError:
        SuggestedSP =  errorMsg


    print("title: " + title)
    print("CurrentSP: " + CurrentSP)
    print("SuggestedSP: " + SuggestedSP)

    f.write(title.replace(",", "|") + "," + CurrentSP.replace(",", "") + "," + SuggestedSP.replace(",", "") + "\n")

f.close()

在BeasutifulSoup4

2 个答案: