我使用read_html阅读了该网站(用于股票交易市场的韩国门户网站)“ https://finance.naver.com/sise/etf.nhn”
当我看到网页的元素时,我看到一个表格,以及表格下方的标签,例如tbody,tr,td,a。
但是,read_html不会正确读取这些标记。我用xml_structure检查了它。只有一个带有ID的人,而没有其他人。
除了read_html之外,我不知道其他方式来阅读网站。
url <- "https://finance.naver.com/sise/etf.nhn"
temp <- url %>%
read_html(encoding="iso-8859-1") %>%
xml_find_all(., "//td[@class = 'ctg']") %>%
as_list()
这没有给我任何东西,所以我用xml_structure对其进行了仔细检查。
xml_structure(read_html(url, encoding="iso-8859-1"))
它给出了以下内容: 在“ tbody”下应该有很多“ tr”和“ td”,但没有。
<table [summary, class, cellspacing, cellpadding]>
<caption>
{text}
{text}
<colgroup>
<col [width]>
<col [width]>
<col [width]>
<col [width]>
<col [width]>
<col [width]>
<col [width]>
<col [width]>
<col [width]>
<tbody [id]>
<p [class]>
tbody标签下没有标签。
顺便说一句,我最终想要得到的是td下href的内容,以便我可以抓取6位数字的股票代码。
非常感谢。
答案 0 :(得分:0)
我建议您使用RSelenium来爬网Java脚本网站。(请参见RSelenium的좋습니다。
library(RSelenium)
url <- "https://finance.naver.com/sise/etf.nhn"
ch=wdman::chrome(port=4445L) #Set channel. If you have problem aboout chrome driver, please comment this answer.(크롬 드라이버로 문제가 생기면 코멘트를 달아주세요 답변해드릴게요)
remDr=remoteDriver(port=4445L, browserName='chrome')
remDr$open() #Open remote browser.
remDr$navigate(url) #Change browser to your web site.
source<-remDr$getPageSource()[[1]]
main <- read_html(source)
dt <- main%>%html_table(fill = TRUE)
dt
[[1]]
종목명 현재가 전일비 등락률 NAV 3개월수익률 거래량 거래대금(백만) 시가총액(억) NA
1 NA NA
2 KODEX 200 25,540 175 +0.69% 25,557 -5.86% 3,033,694 77,550 57,810 NA
3 TIGER 200 25,530 170 +0.67% 25,555 -5.82% 1,854,944 47,440 26,947 NA
4 KODEX 레버리지 10,690 150 +1.42% 10,701 -12.45% 19,195,363 205,260 23,497 NA
5 KODEX 단기채권 101,235 5 0.00% 101,231 +0.57% 35,383 3,581 13,738 NA
6 KBSTAR 200 25,525 170 +0.67% 25,554 -5.88% 204,864 5,239 11,537 NA
7 NA NA
8 NA NA
9 NA NA
10 KODEX 코스닥150 레버리지 6,125 40 -0.65% 6,160 -46.37% 33,056,841 203,228 11,276 NA
11 KODEX 종합채권(AA-이상)액티브 109,320 125 -0.11% 109,301 +3.30% 17,299 1,892 10,505 NA
12 KODEX 삼성그룹 6,000 85 +1.44% 5,995 -4.38% 81,509 488 9,900 NA
13 KODEX 200TR 8,015 60 +0.75% 8,014 -5.76% 611,115 4,900 9,245 NA
14 KODEX 단기채권PLUS 101,300 10 +0.01% 101,295 +0.60% 7,626 772 8,866 NA
15 NA NA
16 NA NA
17 NA NA
18 TIGER 단기통안채 101,085 5 0.00% 101,083 +0.43% 43,335 4,380 8,364 NA
19 HANARO 200 25,550 180 +0.71% 25,566 -5.77% 63,381 1,621 7,780 NA
20 KODEX MSCI Korea TR 7,890 100 +1.28% 7,895 -5.45% 274,458 2,166 7,685 NA
21 ARIRANG 200 25,645 175 +0.69% 25,659 -5.65% 311,488 8,002 7,411 NA
22 KODEX Top5PlusTR 12,100 110 +0.92% 12,104 -4.12% 1,118 13 6,534 NA
23 NA NA
24 NA NA
25 NA NA
26 KOSEF 200 25,765 185 +0.72% 25,770 -5.86% 162,394 4,190 6,531 NA
27 KINDEX 200 25,670 230 +0.90% 25,651 -5.73% 282,744 7,260 6,302 NA
28 KODEX 코스닥 150 8,530 25 -0.29% 8,562 -25.47% 4,096,335 34,976 6,278 NA
29 TIGER MSCI Korea TR 9,820 0 0.00% 9,903 -5.82% 0 0 6,177 NA
30 KODEX 코스피 19,495 245 +1.27% 19,486 -7.45% 39,984 779 5,166 NA
31 NA NA
32 NA NA
33 NA NA
34 KODEX 인버스 7,450 55 -0.73% 7,456 +6.50% 10,460,449 77,916 4,924 NA
35 TIGER TOP10 7,930 115 +1.47% 7,904 -2.34% 3,182 25 4,865 NA
36 SMART 200TR 8,340 70 +0.85% 8,331 -5.60% 18,004 150 4,566 NA
37 TIGER 단기채권액티브 50,590 0 0.00% 50,592 +0.47% 3,894 197 4,404 NA
38 KODEX 200선물인버스2X 8,515 115 -1.33% 8,521 +12.63% 17,984,754 152,854 4,283 NA
39 NA NA
40 NA NA
41 NA NA
42 KODEX 선진국MSCI World 14,075 115 +0.82% N/A +2.55% 3,666 51 3,857 NA
43 ARIRANG 고배당주 11,040 0 0.00% 11,030 -8.00% 145,379 1,607 3,580 NA
44 TIGER 코스닥150 8,585 15 -0.17% 8,599 -25.33% 610,728 5,246 3,216 NA
45 KOSEF 200TR 29,060 220 +0.76% 29,076 -5.91% 8,055 234 2,644 NA
46 KBSTAR 코스피 19,520 200 +1.04% 19,483 -7.66% 767 14 2,372 NA
47 NA NA
48 NA NA
49 NA NA
50 KBSTAR 단기통안채 104,700 10 -0.01% 104,705 +0.54% 318 33 2,368 NA
51 TIGER 차이나CSI300레버리지(합.. 17,010 440 -2.52% N/A -1.96% 101,746 1,761 2,347 NA
52 TIGER 차이나CSI300 8,325 115 -1.36% N/A +0.48% 99,383 834 2,264 NA
53 KBSTAR 대형고배당10TR 9,970 75 +0.76% 9,971 -4.04% 740 7 2,178 NA
54 KINDEX 베트남VN30(합성) 13,530 85 +0.63% N/A +3.56% 107,419 1,451 2,165 NA
55 NA NA
56 NA NA
57 NA NA
58 TIGER 200 IT 19,505 320 +1.67% 19,495 -3.46% 32,063 625 2,146 NA
59 KINDEX 중국본토CSI300 21,905 70 -0.32% N/A +1.25% 94,858 2,072 2,037 NA
60 KODEX 코스닥150선물인버스 9,555 40 +0.42% 9,567 +31.52% 29,789,385 284,731 1,901 NA
61 KINDEX 단기통안채 101,220 0 0.00% 101,220 +0.48% 13,250 1,341 1,843 NA
62 TIGER 코스피 19,495 180 +0.93% 19,490 -7.49% 6,102 119 1,794 NA
63 NA NA
64 NA NA
65 NA NA
66 TIGER 중국소비테마 6,510 0 0.00% 6,515 -17.80% 7,005 45 1,791 NA
67 TIGER 글로벌4차산업혁신기술(합.. 11,600 185 +1.62% N/A +1.89% 43,939 509 1,775 NA
68 KBSTAR 코스닥150 8,445 25 -0.30% 8,464 -25.66% 12,069 101 1,579 NA
69 KODEX 배당가치 9,540 60 +0.63% 9,538 N/A 329 3 1,498 NA
70 TIGER 경기방어 9,195 70 +0.77% 9,193 -15.60% 19,365 178 1,416 NA
71 NA NA
72 NA NA
73 NA NA
74 KODEX 단기변동금리부채권액티브 101,085 15 +0.01% 101,083 +0.41% 1,010 102 1,350 NA
75 파워 200 25,970 175 +0.68% 25,984 -5.90% 50 1 1,273 NA
76 TIGER 200선물레버리지 8,040 115 +1.45% 8,039 -12.08% 273,474 2,200 1,118 NA
77 HANARO 단기통안채 101,550 5 0.00% 101,550 +0.48% 3 0 1,080 NA
78 TIGER 미국S&P500선물(H) 34,445 360 +1.06% N/A +1.82% 141,474 4,865 1,076 NA
79 NA NA
80 NA NA
81 NA NA
82 KODEX KRX300 11,630 80 +0.69% 11,643 -7.70% 21,162 246 1,070 NA
83 TIGER 200선물인버스2X 8,625 165 -1.88% 8,660 +12.01% 983,282 8,500 1,052 NA
84 KBSTAR 코스닥150선물레버리지 5,965 45 -0.75% 5,963 -45.90% 156,885 939 1,041 NA
85 KODEX 골드선물(H) 10,820 15 +0.14% N/A +16.47% 312,156 3,376 1,006 NA
86 KBSTAR 단기국공채액티브 101,615 5 0.00% 101,611 +0.80% 115,631 11,750 966 NA
87 NA NA
88 NA NA
89 NA NA
90 KODEX 국고채3년 57,155 60 -0.10% 57,151 +1.77% 2,832 161 960 NA
91 TIGER 200TR 13,060 110 +0.85% 13,046 -6.21% 1 0 953 NA
92 ARIRANG 코스피50 17,320 195 +1.14% 17,323 -4.36% 51,575 894 909 NA
93 KOSEF 코스닥150 4,230 5 +0.12% 4,231 -25.98% 367 1 874 NA
94 TIGER 헬스케어 23,610 390 +1.68% 23,700 -28.30% 46,091 1,085 869 NA
95 NA NA
96 NA NA
97 NA NA
98 KOSEF 통안채1년 102,430 10 -0.01% 102,432 +0.86% 40,879 4,187 850 NA
99 KBSTAR 중기우량회사채 105,455 65 -0.06% 105,428 +1.70% 2,136 225 844 NA
100 TIGER 코스닥150 레버리지 6,435 30 -0.46% 6,444 -46.42% 1,054,878 6,802 827 NA
[ reached 'max' / getOption("max.print") -- omitted 596 rows ]
如您所知,该网站的表格中有重复的空格。您必须进行数据清理处理。(p다시피,시피사이트이테이존재합니다。