R抓zacks网页

时间:2017-09-17 22:53:35

标签: r web-scraping

我想获取此页面上的数据: https://www.zacks.com/stock/research/STNG/earnings-announcements

我试过用rvest做这个,但也许我必须使用RSelenium?不确定怎么做,有人可以指导我吗?

test <- specific_stocks_earnings %>%
  html_nodes("#earnings_announcements_tabs , .sorting_1") %>% 
  html_text()

test

1 个答案:

答案 0 :(得分:4)

(a)我非常震惊Zacks并不禁止或不鼓励刮擦,但在我扫描的合法的mumbo-jumbo页面中没有任何内容表明这是不好的。

(b)数据在那里,但不是在很好的渲染形式。它们卡在<script>标记中,动态呈现它们。但是,有了一点肘部油脂和V8包装,我们可以得到它:

library(rvest)
library(stringi)
library(V8)

ctx <- v8()

pg <- read_html("https://www.zacks.com/stock/research/STNG/earnings-announcements")

html_nodes(pg, xpath=".//script[contains(., 'obj_data')]") %>% 
  html_text() %>% 
  stri_replace_all_fixed('document.', '') %>% 
  ctx$eval() -> ignore_the_blank_return_value

dat <- ctx$get("obj_data")

str(dat)
## List of 6
##  $ earnings_announcements_earnings_table : chr [1:28, 1:7] "9/18/2017" "4/26/2017" "2/13/2017" "11/14/2016" ...
##  $ earnings_announcements_webcasts_table : chr [1:13, 1:5] "2/13/2017" "11/14/2016" "7/28/2016" "4/27/2016" ...
##  $ earnings_announcements_revisions_table: chr [1:4, 1:6] "7/21/2017" "7/21/2017" "7/21/2017" "7/21/2017" ...
##  $ earnings_announcements_splits_table   : list()
##  $ earnings_announcements_dividends_table: chr [1:17, 1:4] "9/29/2017" "6/14/2017" "3/30/2017" "12/22/2016" ...
##  $ earnings_announcements_guidance_table : chr [1:4, 1:3] "1/21/2015" "10/14/2014" "6/12/2014" "1/28/2013" ...

dat
## $earnings_announcements_earnings_table
##       [,1]         [,2]      [,3]     [,4]     [,5]                                                                    
##  [1,] "9/18/2017"  "6/2017"  "-$0.05" "--"     "--"                                                                    
##  [2,] "4/26/2017"  "3/2017"  "-$0.08" "-$0.07" "<div class=\"right pos positive pos_icon showinline up\">+0.01</div>"  
##  [3,] "2/13/2017"  "12/2016" "-$0.2"  "-$0.18" "<div class=\"right pos positive pos_icon showinline up\">+0.02</div>"  
##  [4,] "11/14/2016" "9/2016"  "-$0.11" "-$0.11" "<div class=\"right pos_na showinline\">0.00</div>"                     
##  [5,] "7/28/2016"  "6/2016"  "$0.03"  "$0.04"  "<div class=\"right pos positive pos_icon showinline up\">+0.01</div>"  
##  [6,] "4/27/2016"  "3/2016"  "$0.18"  "$0.18"  "<div class=\"right pos_na showinline\">0.00</div>"                     
##  [7,] "2/29/2016"  "12/2015" "$0.19"  "$0.21"  "<div class=\"right pos positive pos_icon showinline up\">+0.02</div>"  
##  [8,] "11/4/2015"  "9/2015"  "$0.43"  "$0.46"  "<div class=\"right pos positive pos_icon showinline up\">+0.03</div>"  
##  [9,] "7/29/2015"  "6/2015"  "$0.30"  "$0.32"  "<div class=\"right pos positive pos_icon showinline up\">+0.02</div>"  
## [10,] "4/27/2015"  "3/2015"  "$0.26"  "$0.24"  "<div class=\"right neg negative neg_icon showinline down\">-0.02</div>"
## [11,] "3/2/2015"   "12/2014" "$0.13"  "$0.12"  "<div class=\"right neg negative neg_icon showinline down\">-0.01</div>"
## [12,] "10/27/2014" "9/2014"  "$0.02"  "-$0.01" "<div class=\"right neg negative neg_icon showinline down\">-0.03</div>"
## [13,] "7/28/2014"  "6/2014"  "-$0.06" "-$0.06" "<div class=\"right pos_na showinline\">0.00</div>"                     
## [14,] "4/28/2014"  "3/2014"  "$0.04"  "$0.01"  "<div class=\"right neg negative neg_icon showinline down\">-0.03</div>"
## [15,] "2/24/2014"  "12/2013" "$0.01"  "-$0.08" "<div class=\"right neg negative neg_icon showinline down\">-0.09</div>"
## [16,] "10/28/2013" "9/2013"  "$0.02"  "$0.00"  "<div class=\"right neg negative neg_icon showinline down\">-0.02</div>"
## [17,] "7/29/2013"  "6/2013"  "$0.05"  "$0.03"  "<div class=\"right neg negative neg_icon showinline down\">-0.02</div>"
## [18,] "4/29/2013"  "3/2013"  "$0.05"  "$0.08"  "<div class=\"right pos positive pos_icon showinline up\">+0.03</div>"  
## [19,] "2/25/2013"  "12/2012" "-$0.05" "-$0.08" "<div class=\"right neg negative neg_icon showinline down\">-0.03</div>"
## [20,] "10/29/2012" "9/2012"  "-$0.15" "-$0.09" "<div class=\"right pos positive pos_icon showinline up\">+0.06</div>"  
## [21,] "7/31/2012"  "6/2012"  "--"     "--"     "--"                                                                    
## [22,] "5/3/2012"   "3/2012"  "--"     "--"     "--"                                                                    
## [23,] "2/23/2012"  "12/2011" "--"     "-$2.21" "--"                                                                    
## [24,] "11/14/2011" "9/2011"  "--"     "--"     "--"                                                                    
## [25,] "8/16/2011"  "6/2011"  "--"     "--"     "--"                                                                    
## [26,] "5/10/2011"  "3/2011"  "--"     "--"     "--"                                                                    
## [27,] "3/17/2011"  "12/2010" "--"     "--"     "--"                                                                    
## [28,] "11/15/2010" "9/2010"  "--"     "--"     "--"                                                                    
##       [,6]                                                                        [,7]         
##  [1,] "--"                                                                        "Before Open"
##  [2,] "<div class=\"right pos positive pos_icon showinline up\">+12.50%</div>"    "Before Open"
##  [3,] "<div class=\"right pos positive pos_icon showinline up\">+10.00%</div>"    "Before Open"
##  [4,] "<div class=\"right pos_na showinline\">0.00%</div>"                        "Before Open"
##  [5,] "<div class=\"right pos positive pos_icon showinline up\">+33.33%</div>"    "Before Open"
##  [6,] "<div class=\"right pos_na showinline\">0.00%</div>"                        "Before Open"
##  [7,] "<div class=\"right pos positive pos_icon showinline up\">+10.53%</div>"    "Before Open"
##  [8,] "<div class=\"right pos positive pos_icon showinline up\">+6.98%</div>"     "Before Open"
##  [9,] "<div class=\"right pos positive pos_icon showinline up\">+6.67%</div>"     "Before Open"
## [10,] "<div class=\"right neg negative neg_icon showinline down\">-7.69%</div>"   "Before Open"
## [11,] "<div class=\"right neg negative neg_icon showinline down\">-7.69%</div>"   "Before Open"
## [12,] "<div class=\"right neg negative neg_icon showinline down\">-150.00%</div>" "Before Open"
## [13,] "<div class=\"right pos_na showinline\">0.00%</div>"                        "--"         
## [14,] "<div class=\"right neg negative neg_icon showinline down\">-75.00%</div>"  "Before Open"
## [15,] "<div class=\"right neg negative neg_icon showinline down\">-900.00%</div>" "Before Open"
## [16,] "<div class=\"right neg negative neg_icon showinline down\">-100.00%</div>" "Before Open"
## [17,] "<div class=\"right neg negative neg_icon showinline down\">-40.00%</div>"  "Before Open"
## [18,] "<div class=\"right pos positive pos_icon showinline up\">+60.00%</div>"    "Before Open"
## [19,] "<div class=\"right neg negative neg_icon showinline down\">-60.00%</div>"  "After Close"
## [20,] "<div class=\"right pos positive pos_icon showinline up\">+40.00%</div>"    "Before Open"
## [21,] "--"                                                                        "Before Open"
## [22,] "--"                                                                        "Before Open"
## [23,] "--"                                                                        "--"         
## [24,] "--"                                                                        "Before Open"
## [25,] "--"                                                                        "--"         
## [26,] "--"                                                                        "--"         
## [27,] "--"                                                                        "--"         
## [28,] "--"                                                                        "--"         
## 
## $earnings_announcements_webcasts_table
##       [,1]         [,2]                    [,3]
##  [1,] "2/13/2017"  "Q4 2016 Earnings Call" "--"
##  [2,] "11/14/2016" "Q3 2016 Earnings Call" "--"
##  [3,] "7/28/2016"  "Q2 2016 Earnings Call" "--"
##  [4,] "4/27/2016"  "Q1 2016 Earnings Call" "--"
##  [5,] "2/29/2016"  "Q4 2015 Earnings Call" "--"
##  [6,] "11/4/2015"  "Q3 2015 Earnings Call" "--"
##  [7,] "7/29/2015"  "Q2 2015 Earnings Call" "--"
##  [8,] "4/27/2015"  "Q1 2015 Earnings Call" "--"
##  [9,] "3/2/2015"   "Q4 2014 Earnings Call" "--"
## [10,] "4/28/2014"  "Q1 2014 Earnings Call" "--"
## [11,] "2/24/2014"  "Q4 2013 Earnings Call" "--"
## [12,] "10/28/2013" "Q3 2013 Earnings Call" "--"
## [13,] "7/29/2013"  "Q2 2013 Earnings Call" "--"
##       [,4]                                                                                                                                                                                                                                                                                                      
##  [1,] "<a href=\"http://seekingalpha.com/article/4045508-scorpio-tankers-stng-ceo-emanuele-lauro-q4-2016-results-earnings-call-transcript?source=feed_tag_transcripts\" target = \"_blank\" ><img height=\"15\" width=\"15\" src=\"https://staticx.zacks.com/images/icons/general/transcripts.png\">Open</a>"   
##  [2,] "<a href=\"http://seekingalpha.com/article/4023325-scorpio-tankers-stng-ceo-emanuele-lauro-q3-2016-results-earnings-call-transcript?source=feed_tag_transcripts\" target = \"_blank\" ><img height=\"15\" width=\"15\" src=\"https://staticx.zacks.com/images/icons/general/transcripts.png\">Open</a>"   
##  [3,] "<a href=\"http://seekingalpha.com/article/3993429-scorpio-tankers-stng-management-q2-2016-results-earnings-call-transcript?source=feed_tag_transcripts\" target = \"_blank\" ><img height=\"15\" width=\"15\" src=\"https://staticx.zacks.com/images/icons/general/transcripts.png\">Open</a>"           
##  [4,] "<a href=\"http://seekingalpha.com/article/3968728-scorpio-tankers-stng-ceo-emanuele-lauro-q1-2016-results-earnings-call-transcript?source=feed_tag_transcripts\" target = \"_blank\" ><img height=\"15\" width=\"15\" src=\"https://staticx.zacks.com/images/icons/general/transcripts.png\">Open</a>"   
##  [5,] "<a href=\"http://seekingalpha.com/article/3941526-scorpio-tankers-stng-ceo-emanuele-lauro-q4-2015-results-earnings-call-transcript?source=feed_tag_transcripts\" target = \"_blank\" ><img height=\"15\" width=\"15\" src=\"https://staticx.zacks.com/images/icons/general/transcripts.png\">Open</a>"   
##  [6,] "<a href=\"http://seekingalpha.com/article/3646266-scorpio-tankers-stng-ceo-emanuele-lauro-on-q3-2015-results-earnings-call-transcript?source=feed_tag_transcripts\" target = \"_blank\" ><img height=\"15\" width=\"15\" src=\"https://staticx.zacks.com/images/icons/general/transcripts.png\">Open</a>"
##  [7,] "<a href=\"http://seekingalpha.com/article/3387875-scorpio-tankers-stng-ceo-emanuele-lauro-on-q2-2015-results-earnings-call-transcript?source=feed_tag_transcripts\" target = \"_blank\" ><img height=\"15\" width=\"15\" src=\"https://staticx.zacks.com/images/icons/general/transcripts.png\">Open</a>"
##  [8,] "<a href=\"http://seekingalpha.com/article/3106706-scorpio-tankers-stng-ceo-emanuele-lauro-on-q1-2015-results-earnings-call-transcript?source=feed_tag_transcripts\" target = \"_blank\" ><img height=\"15\" width=\"15\" src=\"https://staticx.zacks.com/images/icons/general/transcripts.png\">Open</a>"
##  [9,] "<a href=\"http://seekingalpha.com/article/2966166-scorpio-tankers-stng-ceo-emanuele-lauro-on-q4-2014-results-earnings-call-transcript?source=feed_tag_transcripts\" target = \"_blank\" ><img height=\"15\" width=\"15\" src=\"https://staticx.zacks.com/images/icons/general/transcripts.png\">Open</a>"
## [10,] "<a href=\"http://seekingalpha.com/article/2169963-scorpio-tankers-ceo-discusses-q1-2014-results-earnings-call-transcript?source=feed\" target = \"_blank\" ><img height=\"15\" width=\"15\" src=\"https://staticx.zacks.com/images/icons/general/transcripts.png\">Open</a>"                             
## [11,] "<a href=\"http://seekingalpha.com/article/2044333-scorpio-tankers-management-discusses-q4-2013-results-earnings-call-transcript?source=feed\" target = \"_blank\" ><img height=\"15\" width=\"15\" src=\"https://staticx.zacks.com/images/icons/general/transcripts.png\">Open</a>"                      
## [12,] "<a href=\"http://seekingalpha.com/article/1779812-scorpio-tankers-ceo-discusses-q3-2013-results-earnings-call-transcript?source=feed\" target = \"_blank\" ><img height=\"15\" width=\"15\" src=\"https://staticx.zacks.com/images/icons/general/transcripts.png\">Open</a>"                             
## [13,] "<a href=\"http://seekingalpha.com/article/1582052-scorpio-tankers-ceo-discusses-q2-2013-results-earnings-call-transcript?source=feed\" target = \"_blank\" ><img height=\"15\" width=\"15\" src=\"https://staticx.zacks.com/images/icons/general/transcripts.png\">Open</a>"                             
##       [,5]      
##  [1,] "9:00 AM" 
##  [2,] "10:30 AM"
##  [3,] "11:00 AM"
##  [4,] "10:30 AM"
##  [5,] "10:30 AM"
##  [6,] "10:00 AM"
##  [7,] "10:00 AM"
##  [8,] "9:30 AM" 
##  [9,] "10:00 AM"
## [10,] "11:00 AM"
## [11,] "11:00 AM"
## [12,] "12:00 PM"
## [13,] "2:30 PM" 
## 
## $earnings_announcements_revisions_table
##      [,1]        [,2]                                            [,3]     [,4]                              
## [1,] "7/21/2017" "<span class=\"hotspot\">Dec 2017 (Q) </span>"  "$0.25"  "<div class=\"down\">$0.03</div>" 
## [2,] "7/21/2017" "<span class=\"hotspot\">Dec 2017 (FY) </span>" "$0.36"  "<div class=\"down\">-$0.24</div>"
## [3,] "7/21/2017" "<span class=\"hotspot\">Sep 2017 (Q) </span>"  "--"     "<div>-$0.12</div>"               
## [4,] "7/21/2017" "<span class=\"hotspot\">Jun 2017 (Q) </span>"  "-$0.05" "<div class=\"down\">-$0.09</div>"
##      [,5]                                      [,6]                                                           
## [1,] "<span class=\"hotspot\">Mavrinac</span>" "<span title='Jefferies & Company' >Jefferies & Company</span>"
## [2,] "<span class=\"hotspot\">Mavrinac</span>" "<span title='Jefferies & Company' >Jefferies & Company</span>"
## [3,] "<span class=\"hotspot\">Mavrinac</span>" "<span title='Jefferies & Company' >Jefferies & Company</span>"
## [4,] "<span class=\"hotspot\">Mavrinac</span>" "<span title='Jefferies & Company' >Jefferies & Company</span>"
## 
## $earnings_announcements_splits_table
## list()
## 
## $earnings_announcements_dividends_table
##       [,1]         [,2]    [,3]         [,4]        
##  [1,] "9/29/2017"  "$0.01" "9/14/2017"  "9/22/2017" 
##  [2,] "6/14/2017"  "$0.01" "4/27/2017"  "5/9/2017"  
##  [3,] "3/30/2017"  "$0.01" "2/14/2017"  "2/21/2017" 
##  [4,] "12/22/2016" "$0.13" "11/14/2016" "11/22/2016"
##  [5,] "9/29/2016"  "$0.13" "7/28/2016"  "9/13/2016" 
##  [6,] "3/30/2016"  "$0.13" "2/29/2016"  "3/8/2016"  
##  [7,] "12/11/2015" "$0.13" "11/4/2015"  "11/20/2015"
##  [8,] "9/4/2015"   "$0.13" "7/29/2015"  "8/12/2015" 
##  [9,] "6/10/2015"  "$0.13" "4/27/2015"  "5/19/2015" 
## [10,] "3/30/2015"  "$0.12" "3/2/2015"   "3/11/2015" 
## [11,] "12/12/2014" "$0.12" "11/11/2014" "11/21/2014"
## [12,] "9/10/2014"  "$0.10" "7/28/2014"  "8/20/2014" 
## [13,] "6/12/2014"  "$0.09" "4/28/2014"  "5/22/2014" 
## [14,] "3/26/2014"  "$0.08" "2/24/2014"  "3/7/2014"  
## [15,] "12/18/2013" "$0.07" "10/28/2013" "11/29/2013"
## [16,] "9/25/2013"  "$0.04" "7/29/2013"  "9/6/2013"  
## [17,] "6/25/2013"  "$0.03" "4/15/2013"  "6/7/2013"  
## 
## $earnings_announcements_guidance_table
##      [,1]         [,2]     [,3]            
## [1,] "1/21/2015"  "$0.21"  "$0.11 - $0.31" 
## [2,] "10/14/2014" "$0.03"  "-$0.03 - $0.07"
## [3,] "6/12/2014"  "$0.01"  "-$0.04 - $0.08"
## [4,] "1/28/2013"  "-$0.02" "-$0.06 - $0.05"

你需要清理一下,但它不需要使用Selenium或Splash。