Hay我正在制作第二个学校项目,并且是在BeautifulSoup的帮助下成为Python Scraper。好吧,我的任务说明如下:我应该组装一个应用程序,从维基百科中删除日期并提供GoT额外季节的全部视图,如果应用程序可以做出以下内容:显示所有季节之前的总数总计,也可以按剧集和总数给出总观看情节剧集,并在总节期间给出所有的总观看次数。
喜欢那样: S01E1:2.22 Milions S02E2:2.20 Milions 。 。 。 第1季总票数:xy
总计:398,7百万
不知何故,我只管理了总计。
如果有人做了类似的事情,请帮忙:) 非常感谢:
import re
import urllib
from BeautifulSoup import BeautifulSoup
wiki_url = 'https://en.wikipedia.org/wiki/Game_of_Thrones'
wiki_html = urllib.urlopen(wiki_url).read()
wiki_content = BeautifulSoup(wiki_html)
seasons_table = wiki_content.find('table', attrs={'class': 'wikitable'})
seasons = seasons_table.findAll('a', attrs={'href': re.compile('\/wiki\/Game_of_Thrones_\(season_?[0-9]+\)')})
views = 0
for season in seasons:
season_url = 'https://en.wikipedia.org' + season['href']
season_html = urllib.urlopen(season_url).read()
season_content = BeautifulSoup(season_html)
episodes_table = season_content.find('table', attrs={'class': 'wikitable plainrowheaders wikiepisodetable'})
if episodes_table:
episode_rows = episodes_table.findAll('tr', attrs={'class': 'vevent'})
if episode_rows:
for episode_row in episode_rows:
episode_views = episode_row.findAll('td')[-1]
views += float(re.sub(r'\[?[0-9]+\]', '', episode_views.text)) # here we search for numbers in the text with a help of a regex (regular expression)
print 'The total number of views is ' + str(views) + ' millions'
答案 0 :(得分:0)
解析时无需任何工作。我所要做的就是如何在屏幕上以你想要的格式输出结果,更像是字符串操作。
代码:
import re
import urllib
from bs4 import BeautifulSoup
wiki_url = 'https://en.wikipedia.org/wiki/Game_of_Thrones'
wiki_html = urllib.urlopen(wiki_url).read()
wiki_content = BeautifulSoup(wiki_html, 'html.parser')
seasons_table = wiki_content.find('table', attrs={'class': 'wikitable'})
seasons = seasons_table.findAll('a', attrs={'href': re.compile('\/wiki\/Game_of_Thrones_\(season_?[0-9]+\)')})
views = 0
total = 0
season_num = 1
for season in seasons:
season_url = 'https://en.wikipedia.org' + season['href']
season_html = urllib.urlopen(season_url).read()
season_content = BeautifulSoup(season_html,'html.parser')
episodes_table = season_content.find('table', attrs={'class': 'wikitable plainrowheaders wikiepisodetable'})
if episodes_table:
episode_rows = episodes_table.findAll('tr', attrs={'class': 'vevent'})
if episode_rows:
episode_num = 1
for episode_row in episode_rows:
episode_views = episode_row.findAll('td')[-1]
views = float(re.sub(r'\[?[0-9]+\]', '', episode_views.text)) # here we search for numbers in the text with a help of a regex (regular expression)
total += float(re.sub(r'\[?[0-9]+\]', '', episode_views.text)) # here we search for numbers in the text with a help of a regex (regular expression)
print 'S' + str(season_num) + "E" + str(episode_num) + " : " + str(views) + " Millions"
episode_num += 1
season_num += 1
print 'The total number of views is ' + str(total) + ' millions'
输出:
S1E1 : 2.22 Millions
S1E2 : 2.2 Millions
S1E3 : 2.44 Millions
S1E4 : 2.45 Millions
S1E5 : 2.58 Millions
S1E6 : 2.44 Millions
S1E7 : 2.4 Millions
S1E8 : 2.72 Millions
S1E9 : 2.66 Millions
S1E10 : 3.04 Millions
S2E1 : 3.86 Millions
S2E2 : 3.76 Millions
S2E3 : 3.77 Millions
S2E4 : 3.65 Millions
S2E5 : 3.9 Millions
S2E6 : 3.88 Millions
S2E7 : 3.69 Millions
S2E8 : 3.86 Millions
S2E9 : 3.38 Millions
S2E10 : 4.2 Millions
S3E1 : 4.37 Millions
S3E2 : 4.27 Millions
S3E3 : 4.72 Millions
S3E4 : 4.87 Millions
S3E5 : 5.35 Millions
S3E6 : 5.5 Millions
S3E7 : 4.84 Millions
S3E8 : 5.13 Millions
S3E9 : 5.22 Millions
S3E10 : 5.39 Millions
S4E1 : 6.64 Millions
S4E2 : 6.31 Millions
S4E3 : 6.59 Millions
S4E4 : 6.95 Millions
S4E5 : 7.16 Millions
S4E6 : 6.4 Millions
S4E7 : 7.2 Millions
S4E8 : 7.17 Millions
S4E9 : 6.95 Millions
S4E10 : 7.09 Millions
S5E1 : 8.0 Millions
S5E2 : 6.81 Millions
S5E3 : 6.71 Millions
S5E4 : 6.82 Millions
S5E5 : 6.56 Millions
S5E6 : 6.24 Millions
S5E7 : 5.4 Millions
S5E8 : 7.01 Millions
S5E9 : 7.14 Millions
S5E10 : 8.11 Millions
S6E1 : 7.94 Millions
S6E2 : 7.29 Millions
S6E3 : 7.28 Millions
S6E4 : 7.82 Millions
S6E5 : 7.89 Millions
S6E6 : 6.71 Millions
S6E7 : 7.8 Millions
S6E8 : 7.6 Millions
S6E9 : 7.66 Millions
S6E10 : 8.89 Millions
S7E1 : 10.11 Millions
S7E2 : 9.27 Millions
S7E3 : 9.25 Millions
S7E4 : 10.17 Millions
S7E5 : 10.72 Millions
S7E6 : 10.24 Millions
S7E7 : 12.07 Millions
The total number of views is 398.73 millions
答案 1 :(得分:0)
你可以像阿里告诉你的那样做,除非你不应该总结它,而是输出它并在我的情况下将它加在单独的变量中:
totalViewsPerSeason
工作解决方案:
import re
import urllib
from BeautifulSoup import BeautifulSoup
wiki_url = 'https://en.wikipedia.org/wiki/Game_of_Thrones'
wiki_html = urllib.urlopen(wiki_url).read()
wiki_content = BeautifulSoup(wiki_html)
seasons_table = wiki_content.find('table', attrs={'class': 'wikitable'})
seasons = seasons_table.findAll('a', attrs={'href': re.compile('\/wiki\/Game_of_Thrones_\(season_?[0-9]+\)')})
views = 0
grandTotalViews = 0
season_num = 1
for season in seasons:
season_url = 'https://en.wikipedia.org' + season['href']
season_html = urllib.urlopen(season_url).read()
season_content = BeautifulSoup(season_html)
episodes_table = season_content.find('table', attrs={'class': 'wikitable plainrowheaders wikiepisodetable'})
if episodes_table:
episode_rows = episodes_table.findAll('tr', attrs={'class': 'vevent'})
if episode_rows:
episode_num = 1
totalViewsPerSeason = 0
for episode_row in episode_rows:
episode_views = episode_row.findAll('td')[-1]
views = float(re.sub(r'\[?[0-9]+\]', '', episode_views.text)) # here we search for numbers in the text with a help of a regex (regular expression)
grandTotalViews += views
totalViewsPerSeason += views
print 'S' + str(season_num) + "E" + str(episode_num) + " : " + str(views) + " Millions"
episode_num += 1
print "Total season " + str(season_num) + " views: " + str(totalViewsPerSeason) + " Millions\n"
season_num += 1
print 'The total number of views is ' + str(grandTotalViews) + ' millions'