python嵌套用于检索css标记值

时间:2015-04-27 18:06:20

标签: python

网页上的标签如下:

<div class="lg_col MT5">
    <p>
        <span class="sp starGryB">4.4</span>
    </p>
    <p class="MT5 UC">
        <span class="gd10gb">141 Ratings</span>
    </p>
</div>

我正在尝试为所有div类值"4.4"检索值"141 Ratings""lg_col MT5"

我使用的嵌套for循环没有按预期工作。似乎没有考虑标签的层次结构。

import requests
import sys
from bs4 import BeautifulSoup

HEADERS = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0"}

def test_function():
    url = "http://www.burrp.com/chennai/search.html?q=buffet"
    source_code = requests.get(url, headers=HEADERS) 
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    for tag in soup.select('div.lg_col.MT5'):
        for tag1 in soup.select('span.sp.starGryB'): 
            try:
                print(tag1.string)
            except KeyError:
                pass
        for tag2 in soup.select('span.gd10gb'):
            try:
                print(tag2.string)
            except KeyError:
                pass

test_function()

`

预期输出为:4.4,然后是网页中每个div标签的141个评级。

但输出是:所有 starGryB 值后跟所有 gd10gb 值,因为这种情况一再发生。

2 个答案:

答案 0 :(得分:1)

如果您只想查看tag.select而不是整个soup.select,请使用tag代替soup

答案 1 :(得分:0)

不适用于积分。

这是另一种方法,以避免必须处理循环。

import requests
from bs4 import BeautifulSoup

HEADERS = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0"}

url = "http://www.burrp.com/chennai/search.html?q=buffet"
source_code = requests.get(url, headers=HEADERS) 
plain_text = source_code.text
soup = BeautifulSoup(plain_text)

tags_1 = soup.find_all('span', class_='sp starGryB')
tags_2 = [tag.parent.parent.select('span.gd10gb') for tag in tags_1]
tags_3 = [tag.parent.parent.parent.select('a.gr24mb.UC') for tag in tags_1]

scores = [score.get_text() for score in tags_1]
ratings = [rating[0].get_text() if len(rating) > 0 else 'NA' for rating in tags_2]
names = [name[0].get_text().strip() for name in tags_3]

tags = zip(names, scores, ratings)
for a, b, c in tags:
    print a, b, c

结果:

Wild Amazon 2.9 27 Ratings
European Buffet NA NA
Flamingo 2.3 17 Ratings
The Holy Smoke 2.9 13 Ratings
Snow Park 2.6 14 Ratings
Dhabba Express 2.7 11 Ratings
The Yellow Chilli 2.7 6 Ratings
The Piano, The Savera Hotel 2.5 6 Ratings
Roasts & Grills, Green Park Hotel 2.3 6 Ratings
[Finished in 0.9s]