Question

我试图解析一个网页，这是我的代码：

from bs4 import BeautifulSoup
import urllib2

openurl = urllib2.urlopen("http://pastebin.com/archive/Python")
read = BeautifulSoup(openurl.read())
soup = BeautifulSoup(openurl)
x = soup.find('ul', {"class": "i_p0"})
sp = soup.findAll('a href')
for x in sp:
    print x

我真的和我相比可能更具体，但正如标题所说，它没有给我任何回应。没有错误，没有。

Answer 1

首先，省略第read = BeautifulSoup(openurl.read())行。

此外，行x = soup.find('ul', {"class": "i_p0"})实际上没有任何区别，因为您在循环中重用x变量。

此外，soup.findAll('a href')找不到任何内容。

此外，BeautifulSoup4中还有一个findAll()，而不是过时的find_all()。

以下是具有多处更改的代码：

from bs4 import BeautifulSoup
import urllib2

openurl = urllib2.urlopen("http://pastebin.com/archive/Python")
soup = BeautifulSoup(openurl)
sp = soup.find_all('a')
for x in sp:
    print x['href']

这将打印页面上所有链接的href属性值。

希望有所帮助。

Answer 2

我在代码中修改了几行，但我得到了回复，但不确定这是否是你想要的。

下面：

openurl = urllib2.urlopen("http://pastebin.com/archive/Python")
soup = BeautifulSoup(openurl.read()) # This is what you need to use for selecting elements
# soup = BeautifulSoup(openurl) # This is not needed
# x = soup.find('ul', {"class": "i_p0"}) # You don't seem to be making a use of this either
sp = soup.findAll('a')
for x in sp:
    print x.get('href') #This is to get the href

希望这会有所帮助。

HTML解析没有回应

2 个答案: