使用R从网站搜索和废弃数据

时间:2018-03-20 17:43:13

标签: r

我有1000条记录,其中包含emailaddress和所有其他地址信息。我希望从这个网站[https://www.melissadata.com/lookups/businesscoder.asp][1]获取每条记录的信息。有没有办法自动化这个过程。

1 个答案:

答案 0 :(得分:0)

这是一个关于如何从网站中提取每个链接的三个例子:

# r library for making requests
library(httr)
# r library for parsing XML and HTML
library(XML)

# performing GET request to website
response <- GET("https://www.melissadata.com/lookups/index.htm", encoding="UTF-8")
# parse response as html in order to run xpath queries
parsedoc <- htmlParse(response)
# perform xpath search query on parsed document
links <- xpathSApply(parsedoc, "//a", xmlGetAttr, "href")

要进行网页搜索,您应该了解xpath查询:https://www.w3schools.com/xml/xpath_intro.asp