urlopen for beautifulsoup循环

时间:2016-08-17 03:00:57

标签: python for-loop beautifulsoup urlopen

新用户在这里。我启动来获取Python语法的悬念,但继续被for循环抛弃。我了解到目前为止我已经达到的每个场景(以及我之前的例子),但似乎无法为我当前的场景提供一个场景。

我正在使用BeautifulSoup从应用程序商店中提取功能作为练习。

我创建了一个GooglePlay和iTunes网址列表,可以使用。

 list = {"https://play.google.com/store/apps/details?id=com.tov.google.ben10Xenodromeplus&hl=en",
"https://play.google.com/store/apps/details?id=com.doraemon.doraemonRepairShopSeasons&hl=en",
"https://play.google.com/store/apps/details?id=com.KnowledgeAdventure.SchoolOfDragons&hl=en",
"https://play.google.com/store/apps/details?id=com.turner.stevenrpg&hl=en",
"https://play.google.com/store/apps/details?id=com.indigokids.mimdoctor&hl=en",
"https://play.google.com/store/apps/details?id=com.rovio.gold&hl=en",
"https://itunes.apple.com/us/app/angry-birds/id343200656?mt=8",
"https://itunes.apple.com/us/app/doodle-jump/id307727765?mt=8",
"https://itunes.apple.com/us/app/tiny-wings/id417817520?mt=8",
"https://itunes.apple.com/us/app/flick-home-run-!/id454086751?mt=8",
"https://itunes.apple.com/us/app/bike-race-pro/id510461370?mt=8"}

为了测试beautifulsoup(我的代码中的bs),我为每个商店使用了一个应用程序:

gptest = bs(urllib.urlopen("https://play.google.com/store/apps/details?id=com.rovio.gold&hl=en"))

ios = bs(urllib.urlopen("https://itunes.apple.com/us/app/doodle-jump/id307727765?mt=8"))

我在iTunes上找到了一个应用类别:

print ios.find(itemprop="applicationCategory").get_text()

...并在Google Play上播放:

print gptest.find(itemprop="genre").get_text()

有了这个新发现的信心,我想尝试迭代整个列表并输出这些值,但后来我意识到我在为循环吮吸......

这是我的尝试:

def opensite():
for item in list:
    bs(urllib.urlopen())

for item in list:
try:
    if "itunes.apple.com" in row:
        print "Category:", opensite.find(itemprop="applicationCategory").get_text()
    else if "play.google.com" in row:
        print "Category", opensite.find(itemprop="genre").get_text()
except:
    pass

注意:理想情况下,我会传递一个csv(称为"示例"有一列" URL")所以我相信我的循环将以

开头
for row in sample.URL:

但我认为向您显示列表而不是处理数据框更有帮助。

提前致谢!

2 个答案:

答案 0 :(得分:1)

from __future__ import print_function   #
try:                                    #
    from urllib import urlopen          # Support Python 2 and 3
except ImportError:                     #
    from urllib.request import urlopen  #

from bs4 import BeautifulSoup as bs

for line in open('urls.dat'): # Read urls from file line by line
    doc = bs(urlopen(line.strip()), 'html5lib') # Strip \n from url, open it and parse
    if 'apple.com' in line:
        prop = 'applicationCategory'
    elif 'google.com' in line:
        prop = 'genre'
    else:
        continue
    print(doc.find(itemprop=prop).get_text())

答案 1 :(得分:1)

尝试从列表中读取网址:

from bs4 import BeautifulSoup as bs
import urllib2
import requests

list = {"https://play.google.com/store/apps/details?id=com.tov.google.ben10Xenodromeplus&hl=en",
"https://play.google.com/store/apps/details?id=com.doraemon.doraemonRepairShopSeasons&hl=en",
"https://play.google.com/store/apps/details?id=com.KnowledgeAdventure.SchoolOfDragons&hl=en",
"https://play.google.com/store/apps/details?id=com.turner.stevenrpg&hl=en",
"https://play.google.com/store/apps/details?id=com.indigokids.mimdoctor&hl=en",
"https://play.google.com/store/apps/details?id=com.rovio.gold&hl=en",
"https://itunes.apple.com/us/app/angry-birds/id343200656?mt=8",
"https://itunes.apple.com/us/app/doodle-jump/id307727765?mt=8",
"https://itunes.apple.com/us/app/tiny-wings/id417817520?mt=8",
"https://itunes.apple.com/us/app/flick-home-run-!/id454086751?mt=8",
"https://itunes.apple.com/us/app/bike-race-pro/id510461370?mt=8"}

def opensite():
    for item in list:
        bs(urllib2.urlopen(item),"html.parser")
        source = requests.get(item)
        text_new = source.text
        soup = bs(text_new, "html.parser")

        try:
            if "itunes.apple.com" in item:
                print item,"Category:",soup.find('span',{'itemprop':'applicationCategory'}).text
            elif "play.google.com" in item:
                print item,"Category:", soup.find('span',{'itemprop':'genre'}).text
        except:
            pass

opensite()

它会打印出像

https://itunes.apple.com/us/app/doodle-jump/id307727765?mt=8 Category: Games
https://play.google.com/store/apps/details?id=com.KnowledgeAdventure.SchoolOfDragons&hl=en Category: Role Playing
https://play.google.com/store/apps/details?id=com.tov.google.ben10Xenodromeplus&hl=en Category: Role Playing
https://itunes.apple.com/us/app/tiny-wings/id417817520?mt=8 Category: Games
https://play.google.com/store/apps/details?id=com.doraemon.doraemonRepairShopSeasons&hl=en Category: Role Playing
https://itunes.apple.com/us/app/angry-birds/id343200656?mt=8 Category: Games
https://play.google.com/store/apps/details?id=com.indigokids.mimdoctor&hl=en Category: Role Playing
https://itunes.apple.com/us/app/bike-race-pro/id510461370?mt=8 Category: Games
https://play.google.com/store/apps/details?id=com.rovio.gold&hl=en Category: Role Playing
https://play.google.com/store/apps/details?id=com.turner.stevenrpg&hl=en Category: Role Playing
https://itunes.apple.com/us/app/flick-home-run-!/id454086751?mt=8 Category: Games