我是R
中进行网络扫描的新手。我正在使用rvest
我可以按如下所示手动转到每年来获取单个年份的比赛记录;
## The URL
http://stats.espncricinfo.com/ci/engine/records/index.html
## structure
RECORDS / ONE-DAY INTERNATIONALS / TEAM RECORDS / LIST OF MATCH RESULTS (BY YEAR)
library(rvest)
cricket_record <- read_html('http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2000;type=year')
cricket_record %>%
html_nodes("table") %>%
.[[1]] %>%
html_table()
Team 1 Team 2 Winner Margin Ground Match Date Scorecard
1 New Zealand West Indies New Zealand 3 wickets Auckland Jan 2, 2000 ODI # 1532
2 New Zealand West Indies New Zealand 7 wickets Taupo Jan 4, 2000 ODI # 1533
3 New Zealand West Indies New Zealand 4 wickets Napier Jan 6, 2000 ODI # 1534
4 New Zealand West Indies New Zealand 8 wickets Wellington Jan 8-9, 2000 ODI # 1535
5 Australia Pakistan Pakistan 45 runs Brisbane Jan 9, 2000 ODI # 1536
6 India Pakistan Pakistan 2 wickets Brisbane Jan 10, 2000 ODI # 1537
7 New Zealand West Indies New Zealand 20 runs Christchurch Jan 11, 2000 ODI # 1538
8 Australia India Australia 28 runs Melbourne Jan 12, 2000 ODI # 1539
9 Australia India Australia 5 wickets Sydney Jan 14, 2000 ODI # 1540
10 Australia Pakistan Australia 6 wickets Melbourne Jan 16, 2000 ODI # 1541
11 Australia Pakistan Australia 81 runs Sydney Jan 19, 2000 ODI # 1542
12 India Pakistan Pakistan 32 runs Hobart Jan 21, 2000 ODI # 1543
13 South Africa Zimbabwe South Africa 6 wickets Johannesburg Jan 21, 2000 ODI # 1544
14 Australia Pakistan Australia 15 runs Melbourne Jan 23, 2000 ODI # 1545
15 South Africa England England 9 wickets Bloemfontein Jan 23, 2000 ODI # 1546
16 India Pakistan India 48 runs Adelaide Jan 25, 2000 ODI # 1547
17 Australia India Australia 152 runs Adelaide Jan 26, 2000 ODI # 1548
18 South Africa England South Africa 1 run Cape Town Jan 26, 2000 ODI # 1549
19 India Pakistan Pakistan 104 runs Perth Jan 28, 2000 ODI # 1550
20 England Zimbabwe Zimbabwe 104 runs Cape Town Jan 28, 2000 ODI # 1551
21 Australia India Australia 4 wickets Perth Jan 30, 2000 ODI # 1552
22 England Zimbabwe England 8 wickets Kimberley Jan 30, 2000 ODI # 1553
23 Australia Pakistan Australia 6 wickets Melbourne Feb 2, 2000 ODI # 1554
24 South Africa Zimbabwe Zimbabwe 2 wickets Durban Feb 2, 2000 ODI # 1555
25 Australia Pakistan Australia 152 runs Sydney Feb 4, 2000 ODI # 1556
26 South Africa England South Africa 2 wickets East London Feb 4, 2000 ODI # 1557
27 South Africa Zimbabwe South Africa 53 runs Port Elizabeth Feb 6, 2000 ODI # 1558
28 Pakistan Sri Lanka Sri Lanka 29 runs Karachi Feb 13, 2000 ODI # 1559
29 South Africa England South Africa 38 runs Johannesburg Feb 13, 2000 ODI # 1560
30 Pakistan Sri Lanka Sri Lanka 34 runs Gujranwala Feb 16, 2000 ODI # 1561
31 Zimbabwe England England 5 wickets Bulawayo Feb 16, 2000 ODI # 1562
32 New Zealand Australia no result Wellington Feb 17, 2000 ODI # 1563
33 Zimbabwe England England 1 wicket Bulawayo Feb 18, 2000 ODI # 1564
34 New Zealand Australia Australia 5 wickets Auckland Feb 19, 2000 ODI # 1565
35 Pakistan Sri Lanka Sri Lanka 104 runs Lahore Feb 19, 2000 ODI # 1566
36 Zimbabwe England England 85 runs Harare Feb 20, 2000 ODI # 1567
37 New Zealand Australia Australia 50 runs Dunedin Feb 23, 2000 ODI # 1568
38 New Zealand Australia Australia 48 runs Christchurch Feb 26, 2000 ODI # 1569
39 New Zealand Australia Australia 5 wickets Napier Mar 1, 2000 ODI # 1570
40 New Zealand Australia New Zealand 7 wickets Auckland Mar 3, 2000 ODI # 1571
41 India South Africa India 3 wickets Kochi Mar 9, 2000 ODI # 1572
42 India South Africa India 6 wickets Jamshedpur Mar 12, 2000 ODI # 1573
43 India South Africa South Africa 2 wickets Faridabad Mar 15, 2000 ODI # 1574
44 India South Africa India 4 wickets Vadodara Mar 17, 2000 ODI # 1575
45 India South Africa South Africa 10 runs Nagpur Mar 19, 2000 ODI # 1576
46 India South Africa South Africa 10 wickets Sharjah Mar 22, 2000 ODI # 1577
47 India Pakistan India 5 wickets Sharjah Mar 23, 2000 ODI # 1578
48 Pakistan South Africa South Africa 3 wickets Sharjah Mar 24, 2000 ODI # 1579
49 India Pakistan Pakistan 98 runs Sharjah Mar 26, 2000 ODI # 1580
50 India South Africa South Africa 6 wickets Sharjah Mar 27, 2000 ODI # 1581
51 Pakistan South Africa Pakistan 67 runs Sharjah Mar 28, 2000 ODI # 1582
52 Pakistan South Africa Pakistan 16 runs Sharjah Mar 31, 2000 ODI # 1583
53 West Indies Zimbabwe West Indies 87 runs Kingston Apr 1, 2000 ODI # 1584
54 West Indies Zimbabwe West Indies 41 runs Kingston Apr 2, 2000 ODI # 1585
55 Pakistan Zimbabwe Pakistan 5 wickets St John's Apr 5, 2000 ODI # 1586
56 South Africa Australia South Africa 6 wickets Durban Apr 12, 2000 ODI # 1587
57 West Indies Pakistan West Indies 96 runs Kingstown Apr 12, 2000 ODI # 1588
58 South Africa Australia Australia 5 wickets Cape Town Apr 14, 2000 ODI # 1589
59 Pakistan Zimbabwe Pakistan 6 wickets St George's Apr 15, 2000 ODI # 1590
60 South Africa Australia South Africa 4 wickets Johannesburg Apr 16, 2000 ODI # 1591
61 West Indies Pakistan West Indies 17 runs St George's Apr 16, 2000 ODI # 1592
62 West Indies Pakistan Pakistan 17 runs Bridgetown Apr 19, 2000 ODI # 1593
63 West Indies Pakistan West Indies 60 runs Port of Spain Apr 22, 2000 ODI # 1594
64 West Indies Pakistan Pakistan 4 wickets Port of Spain Apr 23, 2000 ODI # 1595
65 Bangladesh Sri Lanka Sri Lanka 9 wickets Dhaka May 29, 2000 ODI # 1596
66 Bangladesh India India 8 wickets Dhaka May 30-31, 2000 ODI # 1597
67 India Sri Lanka Sri Lanka 71 runs Dhaka Jun 1, 2000 ODI # 1598
68 Bangladesh Pakistan Pakistan 233 runs Dhaka Jun 2, 2000 ODI # 1599
69 India Pakistan Pakistan 44 runs Dhaka Jun 3, 2000 ODI # 1600
70 Pakistan Sri Lanka Pakistan 7 wickets Dhaka Jun 5, 2000 ODI # 1601
71 Pakistan Sri Lanka Pakistan 39 runs Dhaka Jun 7, 2000 ODI # 1602
72 Sri Lanka Pakistan Sri Lanka 5 wickets Galle Jul 5, 2000 ODI # 1603
73 Sri Lanka South Africa Sri Lanka 37 runs Galle Jul 6, 2000 ODI # 1604
74 West Indies Zimbabwe Zimbabwe 6 wickets Bristol Jul 6, 2000 ODI # 1605
75 Pakistan South Africa South Africa 18 runs Colombo (RPS) Jul 8, 2000 ODI # 1606
76 England Zimbabwe Zimbabwe 5 wickets The Oval Jul 8, 2000 ODI # 1607
77 Sri Lanka Pakistan Sri Lanka 6 wickets Colombo (RPS) Jul 9, 2000 ODI # 1608
78 England West Indies no result Lord's Jul 9, 2000 ODI # 1609
79 Sri Lanka South Africa Sri Lanka 8 wickets Colombo (SSC) Jul 11, 2000 ODI # 1610
80 West Indies Zimbabwe Zimbabwe 70 runs Canterbury Jul 11, 2000 ODI # 1611
81 Pakistan South Africa South Africa 7 wickets Colombo (SSC) Jul 12, 2000 ODI # 1612
82 England Zimbabwe England 8 wickets Manchester Jul 13, 2000 ODI # 1613
83 Sri Lanka South Africa Sri Lanka 30 runs Colombo (RPS) Jul 14, 2000 ODI # 1614
84 England West Indies England 10 wickets Chester-le-Street Jul 15, 2000 ODI # 1615
85 West Indies Zimbabwe Zimbabwe 6 wickets Chester-le-Street Jul 16, 2000 ODI # 1616
86 England Zimbabwe England 52 runs Birmingham Jul 18, 2000 ODI # 1617
87 England West Indies West Indies 3 runs Nottingham Jul 20, 2000 ODI # 1618
88 England Zimbabwe England 6 wickets Lord's Jul 22, 2000 ODI # 1619
89 Australia South Africa Australia 94 runs Melbourne (Docklands) Aug 16, 2000 ODI # 1620
90 Australia South Africa tied Melbourne (Docklands) Aug 18, 2000 ODI # 1621
91 Australia South Africa South Africa 8 runs Melbourne (Docklands) Aug 20, 2000 ODI # 1622
92 New Zealand Pakistan Pakistan 12 runs Singapore Aug 20, 2000 ODI # 1623
93 Pakistan South Africa Pakistan 28 runs Singapore Aug 23, 2000 ODI # 1624
94 New Zealand South Africa South Africa 8 wickets Singapore Aug 25, 2000 ODI # 1625
95 Pakistan South Africa South Africa 93 runs Singapore Aug 27, 2000 ODI # 1626
96 Zimbabwe New Zealand New Zealand 7 wickets Harare Sep 27, 2000 ODI # 1627
97 Zimbabwe New Zealand Zimbabwe 21 runs Bulawayo Sep 30, 2000 ODI # 1628
98 Zimbabwe New Zealand Zimbabwe 6 wickets Bulawayo Oct 1, 2000 ODI # 1629
99 Kenya India India 8 wickets Nairobi (Gym) Oct 3, 2000 ODI # 1630
100 Sri Lanka West Indies Sri Lanka 108 runs Nairobi (Gym) Oct 4, 2000 ODI # 1631
101 Bangladesh England England 8 wickets Nairobi (Gym) Oct 5, 2000 ODI # 1632
102 Australia India India 20 runs Nairobi (Gym) Oct 7, 2000 ODI # 1633
103 Pakistan Sri Lanka Pakistan 9 wickets Nairobi (Gym) Oct 8, 2000 ODI # 1634
104 New Zealand Zimbabwe New Zealand 64 runs Nairobi (Gym) Oct 9, 2000 ODI # 1635
105 England South Africa South Africa 8 wickets Nairobi (Gym) Oct 10, 2000 ODI # 1636
106 New Zealand Pakistan New Zealand 4 wickets Nairobi (Gym) Oct 11, 2000 ODI # 1637
107 India South Africa India 95 runs Nairobi (Gym) Oct 13, 2000 ODI # 1638
108 India New Zealand New Zealand 4 wickets Nairobi (Gym) Oct 15, 2000 ODI # 1639
109 India Sri Lanka Sri Lanka 5 wickets Sharjah Oct 20, 2000 ODI # 1640
110 South Africa New Zealand no result Potchefstroom Oct 20, 2000 ODI # 1641
111 Sri Lanka Zimbabwe Sri Lanka 7 wickets Sharjah Oct 21, 2000 ODI # 1642
112 South Africa New Zealand South Africa 6 wickets Benoni Oct 22, 2000 ODI # 1643
113 India Zimbabwe India 13 runs Sharjah Oct 22, 2000 ODI # 1644
114 Pakistan England England 5 wickets Karachi Oct 24, 2000 ODI # 1645
115 Sri Lanka Zimbabwe Sri Lanka 123 runs Sharjah Oct 25, 2000 ODI # 1646
116 South Africa New Zealand South Africa 115 runs Centurion Oct 25, 2000 ODI # 1647
117 India Zimbabwe India 3 wickets Sharjah Oct 26, 2000 ODI # 1648
118 Pakistan England Pakistan 8 wickets Lahore Oct 27, 2000 ODI # 1649
119 India Sri Lanka Sri Lanka 68 runs Sharjah Oct 27, 2000 ODI # 1650
120 South Africa New Zealand South Africa 5 wickets Kimberley Oct 28, 2000 ODI # 1651
121 India Sri Lanka Sri Lanka 245 runs Sharjah Oct 29, 2000 ODI # 1652
122 Pakistan England Pakistan 6 wickets Rawalpindi Oct 30, 2000 ODI # 1653
123 South Africa New Zealand South Africa 6 wickets Durban Nov 1, 2000 ODI # 1654
124 South Africa New Zealand South Africa 3 wickets Cape Town Nov 4, 2000 ODI # 1655
125 India Zimbabwe India 3 wickets Cuttack Dec 2, 2000 ODI # 1656
126 India Zimbabwe India 61 runs Ahmedabad Dec 5, 2000 ODI # 1657
127 India Zimbabwe Zimbabwe 1 wicket Jodhpur Dec 8, 2000 ODI # 1658
128 India Zimbabwe India 9 wickets Kanpur Dec 11, 2000 ODI # 1659
129 India Zimbabwe India 39 runs Rajkot Dec 14, 2000 ODI # 1660
130 South Africa Sri Lanka South Africa 4 wickets Port Elizabeth Dec 15, 2000 ODI # 1661
131 South Africa Sri Lanka South Africa 95 runs East London Dec 17, 2000 ODI # 1662
第二,我还需要从表格的“记分卡”列中抓取一些信息,例如ODI # 1532
,其中包含指向计分板和比赛摘要的链接。再一次,我可以通过给每个匹配链接作为输入来单独获得;
cricket_score_odi <- read_html('http://www.espncricinfo.com/series/15743/scorecard/64640/new-zealand-vs-west-indies-1st-odi-west-indies-tour-of-new-zealand-1999-00')
cricket_score_odi %>%
html_nodes('.cscore_info-overview , .match-detail--item:nth-child(3) h4 , .match-detail--item:nth-child(3) span , .cscore_name--long , #main-container .cscore_score') %>%
html_text(trim = TRUE)
[1] "1st ODI, West Indies tour of New Zealand at Auckland, Jan 2 2000"
[2] "West Indies"
[3] "268/7"
[4] "New Zealand"
[5] "250/7 (45.1/46 ov, target 250)"
[6] "1st ODI, West Indies tour of New Zealand at Auckland, Jan 2 2000"
[7] "West Indies"
[8] "268/7"
[9] "New Zealand"
[10] "250/7 (45.1/46 ov, target 250)"
[11] "Toss"
[12] "West Indies , elected to bat first"
真的非常感谢!
答案 0 :(得分:0)
根据@ulfelder的建议,我建议为您的两个问题提供purrr
解决方案。
1,准备工作 我创建了一个包含所有year-url的数据框以映射报废。
library(progress)
library(rvest)
library(tidyverse)
(df_url <- tibble(year = 2000:2001) %>%
mutate(url = str_c("http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=", year, ";type=year", sep = "")))
# A tibble: 2 x 2
year url
<int> <chr>
1 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2000;type=year
2 2001 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2001;type=year
2,删除团队记录
将rvest
函数映射到year-url数据框。
(df_records <- df_url %>%
mutate(record = map(url, ~ {read_html(.x) %>%
html_nodes("table") %>%
purrr::pluck(1) %>%
html_table()
})) %>%
unnest())
# A tibble: 251 x 9
year url `Team 1` `Team 2` Winner Margin Ground `Match Date` Scorecard
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html~ New Zealand West Indies New Zeala~ 3 wicke~ Auckland Jan 2, 2000 ODI # 15~
2 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html~ New Zealand West Indies New Zeala~ 7 wicke~ Taupo Jan 4, 2000 ODI # 15~
3 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html~ New Zealand West Indies New Zeala~ 4 wicke~ Napier Jan 6, 2000 ODI # 15~
4 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html~ New Zealand West Indies New Zeala~ 8 wicke~ Wellington Jan 8-9, 20~ ODI # 15~
5 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html~ Australia Pakistan Pakistan 45 runs Brisbane Jan 9, 2000 ODI # 15~
# ... with 246 more rows
3,提取记分卡网址 从 href 属性中将网址提取到记分卡中。
(df_url_card <- df_url %>%
mutate(url_card = map(url, ~{read_html(.x) %>%
html_nodes("td:nth-child(7) .data-link") %>%
html_attr("href")
})) %>%
unnest() %>%
mutate(url_card = str_c("http://stats.espncricinfo.com", url_card, sep = "")))
# A tibble: 251 x 3
year url url_card
<int> <chr> <chr>
1 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2000;type=y~ http://stats.espncricinfo.com/ci/engine/match/64640.h~
2 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2000;type=y~ http://stats.espncricinfo.com/ci/engine/match/64641.h~
3 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2000;type=y~ http://stats.espncricinfo.com/ci/engine/match/64642.h~
4 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2000;type=y~ http://stats.espncricinfo.com/ci/engine/match/64643.h~
5 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2000;type=y~ http://stats.espncricinfo.com/ci/engine/match/65587.h~
# ... with 246 more rows
4,废弃记分卡
我将rvest
函数映射到记分卡网址。由于这可能是大量的网址,因此我建议使用进度条。
pb <- progress_bar$new(format = " downloading [:bar] :percent eta: :eta", total = dim(df_url_card)[1])
(df_scorecard <- df_url_card %>%
mutate(scorecard = map(url_card, ~{pb$tick()
read_html(.x) %>%
html_nodes('.cscore_info-overview , .match-detail--item:nth-child(3) h4 , .match-detail--item:nth-child(3) span , .cscore_name--long , #main-container .cscore_score') %>%
html_text(trim = TRUE)
})))
# A tibble: 251 x 4
year url url_card scorecard
<int> <chr> <chr> <list>
1 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2000~ http://stats.espncricinfo.com/ci/engine/match/6464~ <chr [12~
2 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2000~ http://stats.espncricinfo.com/ci/engine/match/6464~ <chr [12~
3 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2000~ http://stats.espncricinfo.com/ci/engine/match/6464~ <chr [12~
4 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2000~ http://stats.espncricinfo.com/ci/engine/match/6464~ <chr [12~
5 2000 http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2000~ http://stats.espncricinfo.com/ci/engine/match/6558~ <chr [12~
# ... with 246 more rows
df_scorecard$scorecard[1][[1]]
[1] "1st ODI, West Indies tour of New Zealand at Auckland, Jan 2 2000" "West Indies"
[3] "268/7" "New Zealand"
[5] "250/7 (45.1/46 ov, target 250)" "1st ODI, West Indies tour of New Zealand at Auckland, Jan 2 2000"
[7] "West Indies" "268/7"
[9] "New Zealand" "250/7 (45.1/46 ov, target 250)"
[11] "Toss" "West Indies , elected to bat first"
使用url
和url_card
(经过一些处理),您可以将记分卡重新添加到游戏记录中。