我试图通过一系列API调用构建数据框。每次调用都会返回一些JSON,如下所示:
{"ip":"83.108.241.206","country_code":"NO","country_name":"Norway","region_code":"15","region_name":"Sogn og Fjordane","city":"Øvre Årdal","zipcode":"6884","latitude":61.3167,"longitude":7.8,"metro_code":"","area_code":""}
我想将一堆这些调用编译成一个数据框,包括列" ip","国家代码"等等。但我有效地遇到了问题将每个文件放入一个我可以调用rbind的表单中。
我使用网址矢量来进行API调用,如下所示:
> urls <- c("http://freegeoip.net/json/83.108.241.206", "http://freegeoip.net/json/129.118.15.107","http://freegeoip.net/json/189.144.59.71", "http://freegeoip.net/json/24.106.181.190", "http://freegeoip.net/json/213.226.181.3", "http://freegeoip.net/json/84.1.204.89")
> urls
[1] "http://freegeoip.net/json/83.108.241.206"
[2] "http://freegeoip.net/json/129.118.15.107"
[3] "http://freegeoip.net/json/189.144.59.71"
[4] "http://freegeoip.net/json/24.106.181.190"
[5] "http://freegeoip.net/json/213.226.181.3"
[6] "http://freegeoip.net/json/84.1.204.89"
从URL到JSON到数据框的最佳方式是什么?
答案 0 :(得分:1)
我正在复制&#34;成绩单&#34;所以你可以看到中间值和我做的一些错误。使用一些工具并不困难:
> require(RJSONIO) # Used version 1.3-0
> require(downloader) # version 0.3
# probably not necessary but has wider range of url-types it can handle
Loading required package: downloader
> urls <- c("http://freegeoip.net/json/83.108.241.206",
"http://freegeoip.net/json/129.118.15.107",
"http://freegeoip.net/json/189.144.59.71",
"http://freegeoip.net/json/24.106.181.190",
"http://freegeoip.net/json/213.226.181.3",
"http://freegeoip.net/json/84.1.204.89")
>
> download(urls[1], "temp")
100 225 100 225 0 0 1301 0 --:--:-- --:--:-- --:--:-- 2710 0 --:--:-- --:--:-- --:--:-- 0
# Experience tells me to use `quiet=TRUE`
# to prevent bad interactions with my GUI console display
> df <- fromJSON(file("temp")) #### See below for improved strategy ###
> str(df)
List of 11
$ ip : chr "83.108.241.206"
$ country_code: chr "NO"
$ country_name: chr "Norway"
$ region_code : chr "15"
$ region_name : chr "Sogn og Fjordane"
$ city : chr "Øvre Årdal"
$ zipcode : chr "6884"
$ latitude : num 61.3
$ longitude : num 7.8
$ metro_code : chr ""
$ area_code : chr ""
> str(as.data.frame(df))
'data.frame': 1 obs. of 11 variables:
$ ip : Factor w/ 1 level "83.108.241.206": 1
$ country_code: Factor w/ 1 level "NO": 1
$ country_name: Factor w/ 1 level "Norway": 1
$ region_code : Factor w/ 1 level "15": 1
$ region_name : Factor w/ 1 level "Sogn og Fjordane": 1
$ city : Factor w/ 1 level "Øvre Årdal": 1
$ zipcode : Factor w/ 1 level "6884": 1
$ latitude : num 61.3
$ longitude : num 7.8
$ metro_code : Factor w/ 1 level "": 1
$ area_code : Factor w/ 1 level "": 1
> str(as.data.frame(df, stringsAsFactors=FALSE))
'data.frame': 1 obs. of 11 variables:
$ ip : chr "83.108.241.206"
$ country_code: chr "NO"
$ country_name: chr "Norway"
$ region_code : chr "15"
$ region_name : chr "Sogn og Fjordane"
$ city : chr "Øvre Årdal"
$ zipcode : chr "6884"
$ latitude : num 61.3
$ longitude : num 7.8
$ metro_code : chr ""
$ area_code : chr ""
这就是准备工作。如果您将这些列留下作为因素,那么它将与第一个rbind
调用混淆:
df <- as.data.frame( fromJSON(file("temp")) , stringsAsFactors=FALSE)
for ( i in 2:length(urls) ) {download(urls[i], "temp", quiet=TRUE); df <- rbind( df, fromJSON( file("temp") ) )}
> df
ip country_code country_name region_code region_name
df "83.108.241.206" "NO" "Norway" "15" "Sogn og Fjordane"
"129.118.15.107" "US" "United States" "TX" "Texas"
"189.144.59.71" "MX" "Mexico" "09" "Distrito Federal"
"24.106.181.190" "US" "United States" "NC" "North Carolina"
"213.226.181.3" "LT" "Lithuania" "57" "Kauno Apskritis"
"84.1.204.89" "HU" "Hungary" "12" "Komárom-Esztergom"
city zipcode latitude longitude metro_code area_code
df "Øvre Årdal" "6884" 61.3167 7.8 "" ""
"Lubbock" "79409" 33.61 -101.8213 "651" "806"
"Mexico" "" 19.4342 -99.1386 "" ""
"Raleigh" "27604" 35.8181 -78.5636 "560" "919"
"Kaunas" "" 54.9 23.9 "" ""
"Környe" "" 47.5467 18.3208 "" ""
使用stringsAsFactors=FALSE
将强制添加到带有{{1}}的dataframe-class可防止rbind()操作创建列表矩阵或使用因子对行进行rbinding时出现问题。