Question

我有1000条记录，其中包含emailaddress和所有其他地址信息。我希望从这个网站[https://www.melissadata.com/lookups/businesscoder.asp][1]获取每条记录的信息。有没有办法自动化这个过程。

Answer 1

这是一个关于如何从网站中提取每个链接的三个例子：

# r library for making requests
library(httr)
# r library for parsing XML and HTML
library(XML)

# performing GET request to website
response <- GET("https://www.melissadata.com/lookups/index.htm", encoding="UTF-8")
# parse response as html in order to run xpath queries
parsedoc <- htmlParse(response)
# perform xpath search query on parsed document
links <- xpathSApply(parsedoc, "//a", xmlGetAttr, "href")

要进行网页搜索，您应该了解xpath查询：https://www.w3schools.com/xml/xpath_intro.asp

使用R从网站搜索和废弃数据

1 个答案: