Python /解析:BeautifulSoup错误“模块obj不可调用”,其结果来自Mechanize

时间:2012-06-09 20:00:36

标签: python beautifulsoup mechanize

更新:哇,你们所有人都是对的! 由于我还不明白的原因,我需要: “来自BeautifulSoup进口BeautifulSoup” 并添加行:

response = br.submit()
print type(response) #new line
raw = br.response().read()#new line
print type(raw)#new line
print type(br.response().read())#new line
cooked = (br.response().read())#new line
soup = BeautifulSoup(cooked)

/更新

嗯,BeautifulSoup和我没有认识到br.response()。read()的结果。 我已经进口了BeautifulSoup ...

#snippet:
# Select the first (index zero) form
br.select_form(nr=0)
br.form.set_all_readonly(False)
br['__EVENTTARGET'] = list_of_dates[0]
br['__EVENTARGUMENT'] = 'calMain'
br['__VIEWSTATE'] = viewstate
br['__EVENTVALIDATION'] = eventvalidation

response = br.submit()
print br.response().read() #*#this prints the html I'm expecting*

soup = BeautifulSoup(br.response().read()) #*#but this throws 
#TypeError: 'module' object is not callable.  
#Yet if I call soup = BeautifulSoup("http://page.com"), it's cool.*

selecttable = soup.find('table',{'id':"tblItems"})
#/snippet

......等等

所以我知道我有错误的“对象”,但是男人,你认为BeautifulSoup想要什么样的“对象”?

干杯谢谢!!

4 个答案:

答案 0 :(得分:7)

使用

from BeautifulSoup import BeautifulSoup

而不是

import BeautifulSoup

否则我认为你做的是正确的事情!

答案 1 :(得分:1)

您写道:

response = br.submit()
print br.response().read() #*#this prints the html I'm expecting*

soup = BeautifulSoup(br.response().read())

你为什么不试试:

response = br.submit()
soup = BeautifulSoup(response.read())

我怀疑这与您在.read()上调用br.response()的事实有关,在我使用机械化的历史记录中,我总是将response()保存到变量并从那里调用.read()。我不知道它会起作用,并不能解释为什么print br.response().read()有效,但要试一试。

或者,BeautifulSoup的HTML解析器可能不喜欢机械化提供它。您可以尝试使用a different parser

答案 2 :(得分:0)

请确认您的导入是这样的:

from BeautifulSoup import BeautifulSoup

或BeautifulSoup4

from bs4 import BeautifulSoup

答案 3 :(得分:0)

您是否尝试过只读一次对象然后保存结果。

例如:

raw = br.response().read()

soup = BeautifulSoup(raw)

使用文件对象,您可以读取它们一次,然后必须重新打开它们才能再次读取。看起来你正在阅读它们两次。您应该做的另一件事是在阅读之前和之后打印br.response的类型签名。

为了便于调试,请尝试打印类型签名:

print type(response) # see the type of response from above
raw = br.response().read()
print type(raw)
print type(br.response().read()) # see what happens the second time :P

此外,如果你发布了堆栈跟踪也会有所帮助。