我需要检索的是包含/questions/20702626/javac1-8-class-not-found
的href。但我得到的代码输出是//stackoverflow.com
:
from bs4 import BeautifulSoup
import urllib2
url = "http://stackoverflow.com/search?q=incorrect+operator"
content = urllib2.urlopen(url).read()
soup = BeautifulSoup(content)
for tag in soup.find_all('div'):
if tag.get("class")==['summary']:
for tag in soup.find_all('div'):
if tag.get("class")==['result-link']:
for link in soup.find_all('a'):
print link.get('href')
break;
答案 0 :(得分:1)
不要进行嵌套循环,而应编写CSS selector
:
for link in soup.select('div.summary div.result-link a'):
print link.get('href')
这不仅更具可读性,而且还可以解决您的问题。它打印:
/questions/11977228/incorrect-answer-in-operator-overloading
/questions/8347592/sizeof-operator-returns-incorrect-size
/questions/23984762/c-incorrect-signature-for-assignment-operator
...
/questions/24896659/incorrect-count-when-using-comparison-operator
/questions/7035598/patter-checking-check-of-incorrect-number-of-operators-and-brackets
附加说明:您可能希望使用StackExchange API
而不是当前的网页抓取/ HTML解析方法。