I am trying to fetch text from anchor tag, which is embedded in div tag. Following is the link of website `http://mmb.moneycontrol.com/forum-topics/stocks-1.html
The text I want to extract is Mawana Sugars
<a href="/forum-topics/stocks/mawana-sugars-245010.html" class="op_bld16 anch_pb7">Mawana Sugars</a>
So I want to extract all the stocks names listed on this website and description of it.
Here is my attempt to do it in R
doc <- htmlParse("http://mmb.moneycontrol.com/forum-topics/stocks-1.html")
xpathSApply(doc,"//div[@class='clearfix PR PB5']//text()",xmlValue)
But, it does not return anything. How can I do it in R?
答案 0 :(得分:2)
我的答案基本上与我刚才给出的here相同。
数据是动态加载的,无法直接从html中检索。但是,看看&#34; Network&#34;例如,在Chrome DevTools中,我们可以在http://mmb.moneycontrol.com/index.php?q=topic/ajax_call§ion=get_messages&offset=&lmid=&isp=0&gmt=cat_lm&catid=1&pgno=1
找到格式正确的JSON为了帮助您入门:
library(jsonlite)
dat <- fromJSON("http://mmb.moneycontrol.com/index.php?q=topic/ajax_call§ion=get_messages&offset=&lmid=&isp=0&gmt=cat_lm&catid=1&pgno=1")
输出如下:
dat[1:3, c("msg_id", "user_id", "topic", "heading", "flag", "price", "message")]
# msg_id user_id topic heading flag
# 1 47730730 liontrade NMDC Stocks APR
# 2 47730726 agrawalknath Glenmark Glenmark APR
# 3 47730725 bissy91 Infosys Stocks APR
# price
# 1 Price when posted : BSE: Rs. 127.90 NSE: Rs. 128.15
# 2 Price when posted : NSE: Rs. 714.10
# 3 Price when posted : BSE: Rs. 956.50 NSE: Rs. 955.00
# message
# 1 There is no mention of dividend in the announcement.
# 2 Eagerly Waiting for 670 to 675 to BUY second phase of Buying in Cash Delivery. Already Holding @ 800.
# 3 6 ✂ ✂--Don t Pay High Brokerage While Trading. Take Delivery Free & Rs 20 to trade in any size - Join Today . goo.gl/hDqLnm