使用BeautifulSoup和Python刮取数据

时间:2019-03-11 21:16:52

标签: python html beautifulsoup

我从python中的BeautifulSoup开始,我想从Android Play商店中抓取,包装名称和页面中每个应用的价格。

要获取程序包名称,我使用以下代码:

App

这是HTML源代码的一部分:

url = "https://play.google.com/store/apps/category/NEWS_AND_MAGAZINES/collection/topselling_paid"
response = get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
app_container = html_soup.find_all('div', class_="card no-rationale square-cover apps small")

Source Code https://play.google.com/store/apps/category/NEWS_AND_MAGAZINES/collection/topselling_paid

2 个答案:

答案 0 :(得分:4)

for app in html_soup.select('.card.no-rationale.square-cover.apps.small'):
  title = app.select('.title')[0].text
  price = app.select('.price')[0].text

答案 1 :(得分:1)

这只是一种选择。

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = "https://play.google.com/store/apps/category/NEWS_AND_MAGAZINES/collection/topselling_paid"
response =requests.get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
app_container = html_soup.find_all('div', class_="card no-rationale square-cover apps small")
apptitle=[]
appprice=[]

for app in app_container:

    title=app.find('a',class_='title')
    title_text=title['title']
    apptitle.append(title_text)
    price_text=app.find('span',class_="display-price").text
    appprice.append(price_text)

df = pd.DataFrame({"App_Title": apptitle, "App_Price": appprice})
print(df)

输出:

  App_Price                                          App_Title
0      $3.99                                       Pocket Casts
1      $2.99                    Broadcastify Police Scanner Pro
2      $3.99                              Sync for reddit (Pro)
3      $2.99         reddit is fun golden platinum (unofficial)
4      $2.99                             Relay for reddit (Pro)
5      $2.99                         DoggCatcher Podcast Player
6      $1.99                     BaconReader Premium for Reddit
7      $0.99                                The Drudge View Pro
8      $3.99                              Sync for reddit (Dev)
9      $1.49                              Conservative News Pro
10     $4.99                                      News+ Premium
11     $0.99        Mega Millions + Powerball Lotto Games in US
12     $2.99                              VR Browser for Reddit
13     $3.99                             Tiny Tiny RSS Unlocker
14     $3.49                                     Push to Kindle
15     $0.99                                    The Black Vault
16     $1.69                       No Agendroid - No Agenda App
17     $4.99                                     Police Scanner
18     $0.99  1 Radio News Pro: More Features and Shows, No Ads
19     $0.99        Lotto Results Premium - Lottery Games in US
20     $4.99                                    JREPro - No Ads
21    $10.99                          NHK News Donation Version
22     $0.99                                           U.S. 270
23     $1.49                      Pure news widget (scrollable)
24     $0.99                             Lake Okeechobee Levels
25     $0.99                         National Catholic Register
26     $0.99                      The One America News View Pro
27     $1.49                                     RSS Reader Pro
28     $3.99                                           YSN Live
29     $1.99                        Ultimate Conspiracy Premium
30     $0.99                                    News Reader Pro
31     $0.99                                      Tenno Watcher
32    $13.99                                The Aviation Herald
33     $2.96                                   Metro Reader Pro