我正努力从此网站上抓取桌子:
具体来说,我正在尝试为表格中列出的每个游戏的“ Westgate”行抓取“ Run Line”列。
我不确定自己在做什么错,因为我只是想深入了解表格中的文字,因此,由于我对网络抓取的有限了解,这将是我选择的“奇数”表格中的第二个表格。
我尝试搜索我的问题,但是在将任何建议的解决方案应用于我的特定情况时遇到了麻烦。
谢谢您的帮助。
到目前为止,这是我的代码
url='http://www.espn.com/mlb/lines'
driver = webdriver.Chrome()
driver.get(url)
time.sleep(5)
content=driver.page_source
soup=BeautifulSoup(content,'lxml')
driver.quit()
table=soup.find('table',{'class':'tablehead'})
table_row=table.find_all('tr',{'class':'oddrow'})
table_data=table_row.find_all('table',{'class':'tablehead'})[1] #trying to
#just scrape the second table only within this row, ie the Westgate and Runline table
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-397-fea09cb40cb2> in <module>()
----> 1 table_data=table_row.find_all('table',{'class':'tablehead'})
~\Anaconda3\lib\site-packages\bs4\element.py in __getattr__(self, key)
1805 def __getattr__(self, key):
1806 raise AttributeError(
-> 1807 "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
1808 )
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
答案 0 :(得分:1)
我相信以下内容可以提供您想要的输出,可能有更好的方法可以做到这一点,但是我使用了一个嵌套循环将i递增到3,因为您每次都希望汤中有第3个表,然后我递增奇怪的行索引将在循环内从西门行返回“运行行”列:
from bs4 import BeautifulSoup
from selenium import webdriver
url='http://www.espn.com/mlb/lines'
driver = webdriver.webdriver.Chrome()
driver.get(url)
content=driver.page_source
oddrowindex = 0
soup=BeautifulSoup(content,'lxml')
while oddrowindex < 70:
i = 0
table_row=soup.find_all('tr',{'class':'oddrow'})[oddrowindex]
for td in table_row:
if (i == 3):
print(td.text)
i = i + 1
oddrowindex = oddrowindex + 1
driver.quit()
样本输出: