Question

我想使用R程序为网站＆＃34; https://www.latlong.net/convert-address-to-lat-long.html＆＃34;建立一个webcrawler，它可以访问带有地址参数的网站，然后从网站上获取生成的纬度和经度。这将重复我所拥有的数据集的长度。

由于我是网络抓取域的新手，我会寻求指导。

提前致谢。

Answer 1

在过去，我使用了一个名为IP stack（ipstack.com）的API。

示例：数据框＆＃39; d＆＃39;其中包含一列名为＆＃39; ipAddress＆＃39;

的IP地址

for(i in 1:nrow(d)){
  #get data from API and save the text to variable 'str'
  lookupPath <- paste("http://api.ipstack.com/", d$ipAddress[i], "?access_key=INSERT YOUR API KEY HERE&format=1", sep = "")
  str <- readLines(lookupPath)

  #save all the data to a file
  f <- file(paste(i, ".txt", sep = ""))
  writeLines(str,f)
  close(f)

  #save data to main data frame 'd' as well:
  d$ipCountry[i]<-str[7]
  print(paste("Successfully saved ip #:", i))
}

在这个例子中，我特意在每个IP的Country位置之后，它出现在API返回的数据的第7行（因此str [7]）

此API允许您每月免费查找10,000个地址，这对我来说已经足够了。

使用R的Web爬虫

1 个答案: