Question

我正在使用以下代码：

zipCode = str(11021)
url = "http://www.city-data.com/zips/" + zipCode + ".html"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
main_body = soup.findAll(text="City:")
print main_body

找到以下html代码段：

<b>City:</b>
 <a href="/city/New-York-New-York.html">New York, NY</a>

当我使用"City:时，我得到[]。

当我使用"City"时，我得到[u'City', u'City'];但这都不是我正在寻找的字符串。

为什么"City:"不起作用？ :是否存在导致问题的原因？

Answer 1

你可以改变你的方法，去寻找href指向/city/的锚点：

import requests
from bs4 import BeautifulSoup
import re

soup = BeautifulSoup(requests.get('http://www.city-data.com/zips/11021.html').text)
for anchor in soup.find_all('a', href=re.compile(r'/city/')):
    print anchor.string

#Great Neck Estates, NY
#Thomaston, NY
#Great Neck Plaza, NY
#Kensington, NY
#University Gardens, NY
# etc...

对于10001.html，它返回：

New York, NY

Answer 2

这里的代码对我有用：

zipCode = str("07928")
url = "http://www.city-data.com/zips/" + zipCode + ".html"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
if soup.findAll(text="City:") ==[]:
    cityNeeded = soup.findAll(text="Cities:")
    for t in cityNeeded:
        print t.find_next('a').string
else:
    cityNeeded = soup.findAll(text="City:")
    for t in cityNeeded:
        print t.find_next('a').string

为什么不`findAll（text =“City：”）`使用Beautifulsoup在Python中找到相关的字符串

2 个答案: