为什么不`findAll(text =“City:”)`使用Beautifulsoup在Python中找到相关的字符串

时间:2013-12-02 15:40:48

标签: python beautifulsoup

我正在使用以下代码:

zipCode = str(11021)
url = "http://www.city-data.com/zips/" + zipCode + ".html"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
main_body = soup.findAll(text="City:")
print main_body

找到以下html代码段:

<b>City:</b>
 <a href="/city/New-York-New-York.html">New York, NY</a>

当我使用"City:时,我得到[]

当我使用"City"时,我得到[u'City', u'City'];但这都不是我正在寻找的字符串。

为什么"City:"不起作用? :是否存在导致问题的原因?

2 个答案:

答案 0 :(得分:1)

你可以改变你的方法,去寻找href指向/city/的锚点:

import requests
from bs4 import BeautifulSoup
import re

soup = BeautifulSoup(requests.get('http://www.city-data.com/zips/11021.html').text)
for anchor in soup.find_all('a', href=re.compile(r'/city/')):
    print anchor.string

#Great Neck Estates, NY
#Thomaston, NY
#Great Neck Plaza, NY
#Kensington, NY
#University Gardens, NY
# etc...

对于10001.html,它返回:

New York, NY

答案 1 :(得分:0)

这里的代码对我有用:

zipCode = str("07928")
url = "http://www.city-data.com/zips/" + zipCode + ".html"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
if soup.findAll(text="City:") ==[]:
    cityNeeded = soup.findAll(text="Cities:")
    for t in cityNeeded:
        print t.find_next('a').string
else:
    cityNeeded = soup.findAll(text="City:")
    for t in cityNeeded:
        print t.find_next('a').string