how to extract text from anchor tag inside div class in r

时间:2017-05-16 09:37:33

标签: r

I am trying to fetch text from anchor tag, which is embedded in div tag. Following is the link of website `http://mmb.moneycontrol.com/forum-topics/stocks-1.html

The text I want to extract is Mawana Sugars

<a href="/forum-topics/stocks/mawana-sugars-245010.html" class="op_bld16 anch_pb7">Mawana Sugars</a>

So I want to extract all the stocks names listed on this website and description of it.

Here is my attempt to do it in R

doc <- htmlParse("http://mmb.moneycontrol.com/forum-topics/stocks-1.html")
xpathSApply(doc,"//div[@class='clearfix PR PB5']//text()",xmlValue)

But, it does not return anything. How can I do it in R?

enter image description here

1 个答案:

答案 0 :(得分:2)

我的答案基本上与我刚才给出的here相同。

数据是动态加载的,无法直接从html中检索。但是,看看&#34; Network&#34;例如,在Chrome DevTools中,我们可以在http://mmb.moneycontrol.com/index.php?q=topic/ajax_call&section=get_messages&offset=&lmid=&isp=0&gmt=cat_lm&catid=1&pgno=1

找到格式正确的JSON

为了帮助您入门:

library(jsonlite)
dat <- fromJSON("http://mmb.moneycontrol.com/index.php?q=topic/ajax_call&section=get_messages&offset=&lmid=&isp=0&gmt=cat_lm&catid=1&pgno=1")

输出如下:

dat[1:3, c("msg_id", "user_id", "topic", "heading", "flag", "price", "message")]
#     msg_id      user_id    topic  heading flag
# 1 47730730    liontrade     NMDC   Stocks  APR
# 2 47730726 agrawalknath Glenmark Glenmark  APR
# 3 47730725      bissy91  Infosys   Stocks  APR
#                                                  price
# 1 Price when posted :  BSE: Rs. 127.90 NSE: Rs. 128.15
# 2                 Price when posted :  NSE: Rs. 714.10
# 3 Price when posted :  BSE: Rs. 956.50 NSE: Rs. 955.00
#                                                                                                                        message
# 1                                                                         There is no mention of dividend in the announcement.
# 2                        Eagerly Waiting for 670 to 675 to BUY second phase of Buying in Cash Delivery. Already Holding @ 800.
# 3 6 ✂ ✂--Don t Pay High Brokerage While Trading. Take Delivery Free & Rs 20 to trade in any size - Join Today . goo.gl/hDqLnm