我是Python的新手,我真的很想学习更多知识。我正在通过我目前正在做的课程来完成任务......
icon
,title
,description
和screenshots
。python app_fetcher.py <app_id>
。然后,元数据应存储在当前目录中的文件夹中(例如./<app_id>
)我已经开始这个了,但我不确定如何真正去做网页抓取部分的脚本。任何人都可以提供建议。我不知道要使用哪些库或函数来调用。我看过网上但都涉及安装其他软件包。这是我到目前为止,任何帮助将不胜感激!!! ...
# Function to crawl Google Play Store and obtain data
def web_crawl(app_id):
import os, sys, urllib2
try:
# Obtain the URL for the app
url = "https://play.google.com/store/apps/details?id=" + app_id
# open url for reading
response = urllib2.urlopen(url)
# Get path of py file to store txt file locally
fpath = os.path.dirname(os.path.realpath(sys.argv[0]))
# Open file to store app metadata
with open(fpath + "\web_crawl.txt", "w") as f:
f.write("Google Play Store Web Crawler \n")
f.write("Metadata for " + app_id + "\n")
f.write("*************************************** \n")
f.write("Icon: " + "\n")
f.write("Title: " + "\n")
f.write("Description: " + "\n")
f.write("Screenshots: " + "\n")
# Added subtitle
f.write("Subtitle: " + "\n")
# Close file after write
f.close()
except urllib2.HTTPError, e:
print("HTTP Error: ")
print(e.code)
except urllib2.URLError, e:
print("URL Error: ")
print(e.args)
# Call web_crawl function
web_crawl("com.cmplay.tiles2")
答案 0 :(得分:1)
我建议你使用BeautifulSoup。首先,使用此代码
from bs4 import BeautifulSoup
r = requests.get("url");
# optionally check status code here
soup = BeautifulSoup(r.text)
使用汤对象,您可以使用选择器从页面中提取元素
在此处阅读更多内容:https://www.crummy.com/software/BeautifulSoup/bs4/doc/