Question

我正努力从此网站上抓取桌子：

http://www.espn.com/mlb/lines

具体来说，我正在尝试为表格中列出的每个游戏的“ Westgate”行抓取“ Run Line”列。

我不确定自己在做什么错，因为我只是想深入了解表格中的文字，因此，由于我对网络抓取的有限了解，这将是我选择的“奇数”表格中的第二个表格。

我尝试搜索我的问题，但是在将任何建议的解决方案应用于我的特定情况时遇到了麻烦。

谢谢您的帮助。

到目前为止，这是我的代码

url='http://www.espn.com/mlb/lines'
driver = webdriver.Chrome() 
driver.get(url)
time.sleep(5)
content=driver.page_source

soup=BeautifulSoup(content,'lxml')

driver.quit()

table=soup.find('table',{'class':'tablehead'})
table_row=table.find_all('tr',{'class':'oddrow'})
table_data=table_row.find_all('table',{'class':'tablehead'})[1] #trying to 
#just scrape the second table only within this row, ie the Westgate and Runline table

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-397-fea09cb40cb2> in <module>()
----> 1 table_data=table_row.find_all('table',{'class':'tablehead'})

~\Anaconda3\lib\site-packages\bs4\element.py in __getattr__(self, key)
   1805     def __getattr__(self, key):
   1806         raise AttributeError(
-> 1807             "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
   1808         )

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

Answer 1

我相信以下内容可以提供您想要的输出，可能有更好的方法可以做到这一点，但是我使用了一个嵌套循环将i递增到3，因为您每次都希望汤中有第3个表，然后我递增奇怪的行索引将在循环内从西门行返回“运行行”列：

from bs4 import BeautifulSoup
from selenium import webdriver

url='http://www.espn.com/mlb/lines'
driver = webdriver.webdriver.Chrome() 
driver.get(url)
content=driver.page_source

oddrowindex = 0
soup=BeautifulSoup(content,'lxml')

while oddrowindex < 70:
        i = 0
        table_row=soup.find_all('tr',{'class':'oddrow'})[oddrowindex]
        for td in table_row:
                if (i == 3):
                        print(td.text)
                i = i + 1
                oddrowindex = oddrowindex + 1

driver.quit()

样本输出：

刮取表的困难（Python，BeautifulSoup）

1 个答案: