我有以下代码来提取mac的最新MS office版本:
import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://support.office.com/en-us/article/Update-history-
for-Office-2016-for-Mac-700cab62-0d67-4f23-947b-3686cb1a8eb7#bkmk_current'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
name_box = soup.find('p', attrs={'class': 'x-hidden-focus'})
print name_box
我试图抓取Office 2016 for Mac(所有应用程序)
15.39.0
我输出的是无。
任何帮助表示赞赏。谢谢。
答案 0 :(得分:0)
这是有效的,解释在评论中给出。
import requests
import bs4
url = 'https://support.office.com/en-us/article/Update-history-for-Office-2016-for-Mac-700cab62-0d67-4f23-947b-3686cb1a8eb7#bkmk_current'
table_id = 'tblID0EAGAAA'
resp= requests.get(url)
soup = bs4.BeautifulSoup(resp.text, 'lxml')
# find table that contains data of interest
table = soup.find('table', {'id' : table_id})
# get the second row in that table
second_row = table.findAll('tr')[1]
# get the second column in that row
second_column = second_row.findAll('td')[1]
# get the content in this cell
version = second_column.find('p').text
print(version)
答案 1 :(得分:0)
不依赖于table id
的解决方案(每次发布后都可以更改)或行的排序:
from bs4 import BeautifulSoup
import requests
import re
page = requests.get('https://support.office.com/en-us/article/Update-history-or-Office-2016-for-Mac-700cab62-0d67-4f23-947b-3686cb1a8eb7#bkmk_current')
pattern = re.compile(r'^Office.+Mac.*')
version = BeautifulSoup(page.content, 'html.parser') \
.select_one('section.ocpSection table tbody') \
.find('p', text=pattern) \
.parent \
.find_next_sibling('td') \
.select_one('p') \
.text
print(version)