刮取表的困难(Python,BeautifulSoup)

时间:2018-07-14 20:15:22

标签: python python-3.x beautifulsoup

我正努力从此网站上抓取桌子:

http://www.espn.com/mlb/lines

具体来说,我正在尝试为表格中列出的每个游戏的“ Westgate”行抓取“ Run Line”列。

我不确定自己在做什么错,因为我只是想深入了解表格中的文字,因此,由于我对网络抓取的有限了解,这将是我选择的“奇数”表格中的第二个表格。

我尝试搜索我的问题,但是在将任何建议的解决方案应用于我的特定情况时遇到了麻烦。

谢谢您的帮助。

到目前为止,这是我的代码

url='http://www.espn.com/mlb/lines'
driver = webdriver.Chrome() 
driver.get(url)
time.sleep(5)
content=driver.page_source

soup=BeautifulSoup(content,'lxml')

driver.quit()

table=soup.find('table',{'class':'tablehead'})
table_row=table.find_all('tr',{'class':'oddrow'})
table_data=table_row.find_all('table',{'class':'tablehead'})[1] #trying to 
#just scrape the second table only within this row, ie the Westgate and Runline table

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-397-fea09cb40cb2> in <module>()
----> 1 table_data=table_row.find_all('table',{'class':'tablehead'})

~\Anaconda3\lib\site-packages\bs4\element.py in __getattr__(self, key)
   1805     def __getattr__(self, key):
   1806         raise AttributeError(
-> 1807             "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
   1808         )

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

1 个答案:

答案 0 :(得分:1)

我相信以下内容可以提供您想要的输出,可能有更好的方法可以做到这一点,但是我使用了一个嵌套循环将i递增到3,因为您每次都希望汤中有第3个表,然后我递增奇怪的行索引将在循环内从西门行返回“运行行”列:

from bs4 import BeautifulSoup
from selenium import webdriver

url='http://www.espn.com/mlb/lines'
driver = webdriver.webdriver.Chrome() 
driver.get(url)
content=driver.page_source

oddrowindex = 0
soup=BeautifulSoup(content,'lxml')

while oddrowindex < 70:
        i = 0
        table_row=soup.find_all('tr',{'class':'oddrow'})[oddrowindex]
        for td in table_row:
                if (i == 3):
                        print(td.text)
                i = i + 1
                oddrowindex = oddrowindex + 1

driver.quit()

样本输出:

enter image description here