Question

Python和Web Scraping的新手......我一直在寻找突出显示的代码部分，以便我可以检索数字1.16,7.50和14.67，但是在使用td，class，table-matches__odds时没有任何乐趣pageSoup.find_all ...有谁知道我在这里失踪了什么？

我正在使用beautifulsoup 4。

Answer 1

别扭。

首先，我找到了＆＃39;比率的列。 items（赔率？），作为我们想要掠夺的行中的参考点。将它们放在名为ratio的列表中。

然后我看了下一个兄弟姐妹的ratio的典型元素，即第一个。

您只对表格的第一行感兴趣，因此我选择了ratio[0]并询问了下一个兄弟姐妹，这些都是td元素。

然后，我根据其内部结构从每个中提取您想要的内容。唯一复杂的是第一个。我使用descendants迭代器来获取它的后代，请求最里面的后代，然后得到那个属性。

>>> import bs4
>>> import requests
>>> page = requests.get('http://www.betexplorer.com/soccer/scotland/premiership-2016-2017/results/').text
>>> soup = bs4.BeautifulSoup(page, 'lxml')
>>> ratio = soup.findAll('td', attrs={'class': 'h-text-center'})
>>> ratio[0].findNextSiblings()
[<td class="table-matches__odds colored"><span><span><span data-odd="1.16"></span></span></span></td>, <td class="table-matches__odds" data-odd="7.50"></td>, <td class="table-matches__odds" data-odd="14.67"></td>, <td class="h-text-right h-text-no-wrap">21.05.2017</td>]
>>> len(ratio)
15
>>> zeroth_ratio_sibs = ratio[0].findNextSiblings()
>>> first_item = list(zeroth_ratio_sibs[0].descendants)[2].attrs['data-odd']
>>> first_item
'1.16'
>>> second_item = zeroth_ratio_sibs[1].attrs['data-odd']
>>> second_item
'7.50'
>>> third_item = zeroth_ratio_sibs[2].attrs['data-odd']
>>> third_item 
'14.67'

Python Web Scraping td类跨度

1 个答案: