网上刮美丽的汤

时间:2017-10-14 13:14:44

标签: python web-scraping beautifulsoup

我有以下代码来提取mac的最新MS office版本:

import urllib2
from bs4 import BeautifulSoup

quote_page = 'https://support.office.com/en-us/article/Update-history-
for-Office-2016-for-Mac-700cab62-0d67-4f23-947b-3686cb1a8eb7#bkmk_current'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
name_box = soup.find('p', attrs={'class': 'x-hidden-focus'})
print name_box

我试图抓取Office 2016 for Mac(所有应用程序)

15.39.0

我输出的是无。

任何帮助表示赞赏。谢谢。

2 个答案:

答案 0 :(得分:0)

这是有效的,解释在评论中给出。

import requests
import bs4

url = 'https://support.office.com/en-us/article/Update-history-for-Office-2016-for-Mac-700cab62-0d67-4f23-947b-3686cb1a8eb7#bkmk_current'

table_id = 'tblID0EAGAAA'
resp= requests.get(url)

soup = bs4.BeautifulSoup(resp.text, 'lxml')

# find table that contains data of interest
table = soup.find('table', {'id' : table_id})

# get the second row in that table
second_row = table.findAll('tr')[1]

# get the second column in that row
second_column = second_row.findAll('td')[1]

# get the content in this cell
version = second_column.find('p').text

print(version)

答案 1 :(得分:0)

不依赖于table id的解决方案(每次发布后都可以更改)或行的排序:

from bs4 import BeautifulSoup
import requests
import re

page = requests.get('https://support.office.com/en-us/article/Update-history-or-Office-2016-for-Mac-700cab62-0d67-4f23-947b-3686cb1a8eb7#bkmk_current')
pattern = re.compile(r'^Office.+Mac.*')

version = BeautifulSoup(page.content, 'html.parser') \
            .select_one('section.ocpSection table tbody') \
            .find('p', text=pattern) \
            .parent \
            .find_next_sibling('td') \
            .select_one('p') \
            .text
print(version)