我正在尝试从以下网页抓取数据 https://www.cricbuzz.com/live-cricket-scorecard/10711/aus-vs-ind-1st-test-india-in-australia-test-series-2011-12 我需要表格形式的记分牌。谁能帮我吗?我正在使用python3。我是Web爬网的新手,对网页的内部结构不太熟悉。 预先感谢!
我尝试将BeautifulSoup与urllib2等结合使用,但没有到达任何地方。
答案 0 :(得分:0)
您可以使用熊猫的read_html()
。这将返回数据帧列表。从那里开始,您将如何处理它。您可能需要整理一下数据,但我只是将它们转储到一张大表中以显示给您。
import pandas as pd
url = 'https://m.cricbuzz.com/live-cricket-scorecard/10711/aus-vs-ind-1st-test-india-in-australia-test-series-2011-12'
dfs = pd.read_html(url)
result = pd.concat( [ df for df in dfs ] )
输出:
print (result.to_string())
0 1 2 3 4
0 Batting R B 4s 6s
0 Ed Cowan 68 177 7 0
1 c M Dhoni b R Ashwin c M Dhoni b R Ashwin c M Dhoni b R Ashwin c M Dhoni b R Ashwin c M Dhoni b R Ashwin
0 David Warner 37 49 4 1
1 c M Dhoni b U Yadav c M Dhoni b U Yadav c M Dhoni b U Yadav c M Dhoni b U Yadav c M Dhoni b U Yadav
0 Shaun Marsh 0 6 0 0
1 c V Kohli b U Yadav c V Kohli b U Yadav c V Kohli b U Yadav c V Kohli b U Yadav c V Kohli b U Yadav
0 Ricky Ponting 62 94 6 0
1 c V Laxman b U Yadav c V Laxman b U Yadav c V Laxman b U Yadav c V Laxman b U Yadav c V Laxman b U Yadav
0 Michael Clarke 31 68 5 0
1 b Z Khan b Z Khan b Z Khan b Z Khan b Z Khan
0 Michael Hussey 0 1 0 0
1 c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan
0 Brad Haddin 27 69 1 0
1 c V Sehwag b Z Khan c V Sehwag b Z Khan c V Sehwag b Z Khan c V Sehwag b Z Khan c V Sehwag b Z Khan
0 Peter Siddle 41 100 4 0
1 c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan
0 James Pattinson 18 54 2 0
1 not out not out not out not out not out
0 Ben Hilfenhaus 19 32 3 0
1 c V Kohli b R Ashwin c V Kohli b R Ashwin c V Kohli b R Ashwin c V Kohli b R Ashwin c V Kohli b R Ashwin
0 Nathan Lyon 6 11 1 0
1 b R Ashwin b R Ashwin b R Ashwin b R Ashwin b R Ashwin
0 Bowler O M R W
1 Zaheer Khan 31 6 77 4
2 Ishant Sharma 24 7 48 0
3 Umesh Yadav 26 5 106 3
4 Ravichandran Ashwin 29 3 81 3
0 Home Live Scores NaN NaN NaN
1 Schedule News NaN NaN NaN
2 Editorials Photos NaN NaN NaN
3 Archives Players NaN NaN NaN
4 Rankings Series NaN NaN NaN
5 Poll Videos NaN NaN NaN
6 Points Table Contact Us NaN NaN NaN
7 Cricbuzz TV Ads Careers @ Cricbuzz NaN NaN NaN
8 Mobile Apps This day that year NaN NaN NaN
9 Wickets Zone NaN NaN NaN NaN
0 Mobile Apps Social Channels NaN NaN NaN
1 iPhone facebook NaN NaN NaN
2 Android twitter NaN NaN NaN