我正在使用以下代码:
zipCode = str(11021)
url = "http://www.city-data.com/zips/" + zipCode + ".html"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
main_body = soup.findAll(text="City:")
print main_body
找到以下html代码段:
<b>City:</b>
<a href="/city/New-York-New-York.html">New York, NY</a>
当我使用"City:
时,我得到[]
。
当我使用"City"
时,我得到[u'City', u'City']
;但这都不是我正在寻找的字符串。
为什么"City:"
不起作用? :
是否存在导致问题的原因?
答案 0 :(得分:1)
你可以改变你的方法,去寻找href
指向/city/
的锚点:
import requests
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(requests.get('http://www.city-data.com/zips/11021.html').text)
for anchor in soup.find_all('a', href=re.compile(r'/city/')):
print anchor.string
#Great Neck Estates, NY
#Thomaston, NY
#Great Neck Plaza, NY
#Kensington, NY
#University Gardens, NY
# etc...
对于10001.html,它返回:
New York, NY
答案 1 :(得分:0)
这里的代码对我有用:
zipCode = str("07928")
url = "http://www.city-data.com/zips/" + zipCode + ".html"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
if soup.findAll(text="City:") ==[]:
cityNeeded = soup.findAll(text="Cities:")
for t in cityNeeded:
print t.find_next('a').string
else:
cityNeeded = soup.findAll(text="City:")
for t in cityNeeded:
print t.find_next('a').string