新用户在这里。我启动来获取Python语法的悬念,但继续被for循环抛弃。我了解到目前为止我已经达到的每个场景(以及我之前的例子),但似乎无法为我当前的场景提供一个场景。
我正在使用BeautifulSoup从应用程序商店中提取功能作为练习。
我创建了一个GooglePlay和iTunes网址列表,可以使用。
list = {"https://play.google.com/store/apps/details?id=com.tov.google.ben10Xenodromeplus&hl=en",
"https://play.google.com/store/apps/details?id=com.doraemon.doraemonRepairShopSeasons&hl=en",
"https://play.google.com/store/apps/details?id=com.KnowledgeAdventure.SchoolOfDragons&hl=en",
"https://play.google.com/store/apps/details?id=com.turner.stevenrpg&hl=en",
"https://play.google.com/store/apps/details?id=com.indigokids.mimdoctor&hl=en",
"https://play.google.com/store/apps/details?id=com.rovio.gold&hl=en",
"https://itunes.apple.com/us/app/angry-birds/id343200656?mt=8",
"https://itunes.apple.com/us/app/doodle-jump/id307727765?mt=8",
"https://itunes.apple.com/us/app/tiny-wings/id417817520?mt=8",
"https://itunes.apple.com/us/app/flick-home-run-!/id454086751?mt=8",
"https://itunes.apple.com/us/app/bike-race-pro/id510461370?mt=8"}
为了测试beautifulsoup(我的代码中的bs),我为每个商店使用了一个应用程序:
gptest = bs(urllib.urlopen("https://play.google.com/store/apps/details?id=com.rovio.gold&hl=en"))
ios = bs(urllib.urlopen("https://itunes.apple.com/us/app/doodle-jump/id307727765?mt=8"))
我在iTunes上找到了一个应用类别:
print ios.find(itemprop="applicationCategory").get_text()
...并在Google Play上播放:
print gptest.find(itemprop="genre").get_text()
有了这个新发现的信心,我想尝试迭代整个列表并输出这些值,但后来我意识到我在为循环吮吸......
这是我的尝试:
def opensite():
for item in list:
bs(urllib.urlopen())
for item in list:
try:
if "itunes.apple.com" in row:
print "Category:", opensite.find(itemprop="applicationCategory").get_text()
else if "play.google.com" in row:
print "Category", opensite.find(itemprop="genre").get_text()
except:
pass
注意:理想情况下,我会传递一个csv(称为"示例"有一列" URL")所以我相信我的循环将以
开头for row in sample.URL:
但我认为向您显示列表而不是处理数据框更有帮助。
提前致谢!
答案 0 :(得分:1)
from __future__ import print_function #
try: #
from urllib import urlopen # Support Python 2 and 3
except ImportError: #
from urllib.request import urlopen #
from bs4 import BeautifulSoup as bs
for line in open('urls.dat'): # Read urls from file line by line
doc = bs(urlopen(line.strip()), 'html5lib') # Strip \n from url, open it and parse
if 'apple.com' in line:
prop = 'applicationCategory'
elif 'google.com' in line:
prop = 'genre'
else:
continue
print(doc.find(itemprop=prop).get_text())
答案 1 :(得分:1)
尝试从列表中读取网址:
from bs4 import BeautifulSoup as bs
import urllib2
import requests
list = {"https://play.google.com/store/apps/details?id=com.tov.google.ben10Xenodromeplus&hl=en",
"https://play.google.com/store/apps/details?id=com.doraemon.doraemonRepairShopSeasons&hl=en",
"https://play.google.com/store/apps/details?id=com.KnowledgeAdventure.SchoolOfDragons&hl=en",
"https://play.google.com/store/apps/details?id=com.turner.stevenrpg&hl=en",
"https://play.google.com/store/apps/details?id=com.indigokids.mimdoctor&hl=en",
"https://play.google.com/store/apps/details?id=com.rovio.gold&hl=en",
"https://itunes.apple.com/us/app/angry-birds/id343200656?mt=8",
"https://itunes.apple.com/us/app/doodle-jump/id307727765?mt=8",
"https://itunes.apple.com/us/app/tiny-wings/id417817520?mt=8",
"https://itunes.apple.com/us/app/flick-home-run-!/id454086751?mt=8",
"https://itunes.apple.com/us/app/bike-race-pro/id510461370?mt=8"}
def opensite():
for item in list:
bs(urllib2.urlopen(item),"html.parser")
source = requests.get(item)
text_new = source.text
soup = bs(text_new, "html.parser")
try:
if "itunes.apple.com" in item:
print item,"Category:",soup.find('span',{'itemprop':'applicationCategory'}).text
elif "play.google.com" in item:
print item,"Category:", soup.find('span',{'itemprop':'genre'}).text
except:
pass
opensite()
它会打印出像
https://itunes.apple.com/us/app/doodle-jump/id307727765?mt=8 Category: Games
https://play.google.com/store/apps/details?id=com.KnowledgeAdventure.SchoolOfDragons&hl=en Category: Role Playing
https://play.google.com/store/apps/details?id=com.tov.google.ben10Xenodromeplus&hl=en Category: Role Playing
https://itunes.apple.com/us/app/tiny-wings/id417817520?mt=8 Category: Games
https://play.google.com/store/apps/details?id=com.doraemon.doraemonRepairShopSeasons&hl=en Category: Role Playing
https://itunes.apple.com/us/app/angry-birds/id343200656?mt=8 Category: Games
https://play.google.com/store/apps/details?id=com.indigokids.mimdoctor&hl=en Category: Role Playing
https://itunes.apple.com/us/app/bike-race-pro/id510461370?mt=8 Category: Games
https://play.google.com/store/apps/details?id=com.rovio.gold&hl=en Category: Role Playing
https://play.google.com/store/apps/details?id=com.turner.stevenrpg&hl=en Category: Role Playing
https://itunes.apple.com/us/app/flick-home-run-!/id454086751?mt=8 Category: Games