使用python和bs4进行Web Scraping

时间:2017-03-25 14:39:19

标签: python css web-scraping bs4

我正在尝试从此网址bloomberg的以下代码中删除数据。

我想获取以下标签及其相应值的文字:

*

1. open.
 2. previous close.
 3. ytd return.
 4. market_cap.
 5. day range.
 6. 52wk_range.
 7. current_per_ratio.
 8. shares_outstanding.
 9. volume.
 10. one_year_return.
 11. Earnings_per_share.
 12. price_sales.
 13. divident_indecated_gross_yeild.

*

我试了但是失败了,不知道用python中的bs4做正确的方法。

请引导我按照我想要的方式实现它。

<div class="data-table data-table_detailed"><!-- no spaces --><div class="cell cell__mobile-basic cell__visible__even"> <div class="cell__label"> Open </div> <div
    > class="cell__value cell__value_"> 1,040.40 </div> </div><!-- no spaces
    > --><!-- no spaces --><div class="cell cell__mobile-basic"> <div class="cell__label"> Day Range </div> <div class="cell__value
    > cell__value_"> 1,026.00 - 1,044.00 </div> </div><!-- no spaces --><!--
    > no spaces --><div class="cell cell__mobile-basic cell__visible__even">
    > <div class="cell__label"> Volume </div> <div class="cell__value
    > cell__value_"> 2,580,677 </div> </div><!-- no spaces --><!-- no spaces
    > --><div class="cell cell__mobile-basic"> <div class="cell__label"> Previous Close </div> <div class="cell__value cell__value_"> 1,040.45
    > </div> </div><!-- no spaces --><!-- no spaces --><div class="cell
    > cell__mobile-basic cell__visible__even"> <div class="cell__label">
    > 52Wk Range </div> <div class="cell__value cell__value_"> 900.30 -
    > 1,279.30 </div> </div><!-- no spaces --><!-- no spaces --><div
    > class="cell cell__mobile-basic"> <div class="cell__label"> 1 Yr Return
    > </div> <div class="cell__value cell__value_down"> -12.66% </div>
    > </div><!-- no spaces --><!-- no spaces --><div class="cell
    > cell__mobile-basic cell__visible__even"> <div class="cell__label"> YTD
    > Return </div> <div class="cell__value cell__value_up"> 2.06% </div>
    > </div><!-- no spaces --><!-- no spaces --><div class="cell "> <div
    > class="cell__label"> Current P/E Ratio (TTM) </div> <div
    > class="cell__value cell__value_"> 16.43 </div> </div><!-- no spaces
    > --><!-- no spaces --><div class="cell  cell__visible__even"> <div class="cell__label"> Earnings per Share (INR) (TTM) </div> <div
    > class="cell__value cell__value_"> 62.76 </div> </div><!-- no spaces
    > --><!-- no spaces --><div class="cell "> <div class="cell__label"> Market Cap (t INR) </div>  <div class="cell__value cell__value_">
    > 2.369 </div>  </div><!-- no spaces --><!-- no spaces --> <div class="cell  cell__visible__even">  <div class="cell__label"> Shares
    > Outstanding  (b) </div>  <div class="cell__value cell__value_"> 2.297
    > </div>  </div><!-- no spaces --><!-- no spaces --><div class="cell "> 
    > <div class="cell__label"> Price/Sales (TTM) </div>  <div
    > class="cell__value cell__value_"> 3.47 </div>  </div><!-- no spaces
    > --><!-- no spaces --> <div class="cell  cell__visible__even">  <div class="cell__label"> Dividend Indicated Gross Yield </div>  <div
    > class="cell__value cell__value_"> 2.45% </div>  </div><!-- no spaces
    > --></div>

for cell in soup.find_all("div", class_='data-table data-table_detailed'):
    name = ""
    namecell = cell.find("div", class_="cell__label", text=True)
    if namecell is not None:
         name = namecell.get_text(strip=True)
    price_chage = cell.find("div", class_="cell__value cell__value").get_text(strip=True)
    data+=( "%s: Price Change:  %s," % (name, price_chage))

1 个答案:

答案 0 :(得分:2)

所有正在路过并给予否定投票的极客,这是代码,我写了它。恭喜我!,如果你不能帮助任何人,那么就不要投反对票。

for cell in soup.find_all("div", class_='cell ' ):

    namecell = cell.find("div", class_="cell__label", text=True).get_text(strip=True)
    if cell.find("div", class_=("cell__value cell__value_down"),text=True):
        classText="cell__value cell__value_down"
    elif cell.find("div", class_=("cell__value cell__value_up"),text=True):
        classText = "cell__value cell__value_up"
    else:
        classText = "cell__value cell__value_"

    value=cell.find("div", class_=(classText),text=True).get_text(strip=True)
    if   namecell and value is not None:
         tbl_data.append( namecell+":"+value)
print tbl_data

输出为:

[u'Open:1,040.40', u'Day Range:1,026.00 - 1,044.00', u'Volume:2,580,677', u'Previous Close:1,040.45', u'52Wk Range:900.30 - 1,279.30', u'1 Yr Return:-12.66%', u'Current P/E Ratio (TTM):16.43', u'Earnings per Share (INR) (TTM):62.76', u'Market Cap (t INR):2.369', u'Shares Outstanding  (b):2.297', u'Price/Sales (TTM):3.47', u'Dividend Indicated Gross Yield:2.45%']