在这个页面上,有一系列表我试图从未命名的表和未命名的单元格中获取特定数据。我使用Chrome中的inspect元素中的Copy Selector来查找CSS选择器。当我要求Python打印特定的CSS选择器时,我得到了#Nonetype'对象不可调用
特别是在这个页面上,我试图得到数字" 198"从#general-info,article:nth-child(4),table:nth-child(2),
中的表中显示CSS Selector路径是:
"html body div#program-details section#general-info article.grid-50 table tbody tr td"
然后使用复制选择器
#general-info > article:nth-child(4) > table:nth-child(2) > tbody > tr > td:nth-child(2)
大部分代码都是访问网站并绕过EULA。请跳到底部,查看我遇到问题的代码。
import mechanize
import requests
import urllib2
import urllib
import csv
from BeautifulSoup import BeautifulSoup
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [("User-agent","Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13")]
sign_in = br.open('https://login.ama-assn.org/account/login') #the login url
br.select_form(name = "go") #Alternatively you may use this instead of the above line if your form has name attribute available.
br["username"] = "wasabinoodlz" #the key "username" is the variable that takes the username/email value
br["password"] = "Bongshop10" #the key "password" is the variable that takes the password value
logged_in = br.submit() #submitting the login credentials
logincheck = logged_in.read() #reading the page body that is redirected after successful login
#print (logincheck) #printing the body of the redirected url after login
# EULA agreement stuff
cont = br.open('https://freida.ama-assn.org/Freida/eula.do').read()
cont1 = br.open('https://freida.ama-assn.org/Freida/eulaSubmit.do').read()
# Begin request for page data
req = br.open('https://freida.ama-assn.org/Freida/user/programDetails.do?pgmNumber=1205712369').read()
#Da Soups!
soup = BeautifulSoup(req)
#print soup.prettify() # use this to read html.prettify()
for score in soup.select('#general-info > article:nth-child(4) > table:nth-child(2) > tbody > tr > td:nth-child(2)'):
print score.string
答案 0 :(得分:0)
您需要使用BeautifulSoup
解析器初始化html5lib
。
soup = BeautifulSoup(req, 'html5lib')
BeautifulSoup
仅实现nth-of-type
伪选择器。
data = soup.select(
'#general-info > '
'article:nth-of-type(4) > '
'table:nth-of-type(2) > '
'tbody > '
'tr > '
'td:nth-of-type(2)'
)