如何在python上使用Beautifulsoup刮掉一些标签

时间:2015-11-21 07:09:15

标签: python-2.7 web-scraping beautifulsoup

网址我正在尝试抓取:https://play.google.com/store/apps/details?id=com.wsandroid.suite

enter image description here

import urllib2
from bs4 import BeautifulSoup

pkg = "com.wsandroid.suite"
url = "https://play.google.com/store/apps/details?id=" + pkg
html = urllib2.urlopen(url).read()

soup = BeautifulSoup(html, 'html.parser')
appTitle = soup.find("div", {"class": "document-title"}).text
date = soup.find("div", {"itemprop", "datePublished"})
print appTitle
print date  #THIS PRINTS NOTHING

输出

mine-MBP:learningpython neilnidhi$ python playstorescraper.py
https://play.google.com/store/apps/details?id=com.wsandroid.suite
 Security & Power Booster -free 
None //**NOTHING IS GETTING PRINTED HERE**

1 个答案:

答案 0 :(得分:0)

你有几个错别字导致你的问题。比较appTitle变量和date变量的格式。

更改

date = soup.find("div", {"itemprop", "datePublished"})

date = soup.find("div", {"itemprop": "datePublished"}).text