我正在搜索我想要的标签之前的文本City
:城市和州字符串。这是html:
<b>City:</b>
<a href="/city/New-York-New-York.html">New York, NY</a>
这是代码:
zipCode = str(11021)
url = "http://www.city-data.com/zips/" + zipCode + ".html"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
main_body = soup.findAll(text="City:")
print main_body
然而,我得到的只是空括号。如何搜索City:
文本,然后获取下一个标记的字符串?
答案 0 :(得分:0)
您可以在文本节点中使用next_elements
,直到找到<a>
标记并提取其文本:
from bs4 import BeautifulSoup
import sys
soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')
for t in soup.find_all(text="City:"):
print(t)
for e in t.next_elements:
if e.name == 'a':
print(e.string)
break
运行它(asumming htmlfile
包含问题的测试数据):
python3 script.py htmlfile
产量:
City:
New York, NY
答案 1 :(得分:0)
zipCode = str("07928")
url = "http://www.city-data.com/zips/" + zipCode + ".html"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
if soup.findAll(text="City:") ==[]:
cityNeeded = soup.findAll(text="Cities:")
for t in cityNeeded:
print t.find_next('a').string
else:
cityNeeded = soup.findAll(text="City:")
for t in cityNeeded:
print t.find_next('a').string