我尝试使用键控JSON名称中包含的信息来为嵌套矩阵中包含的数据添加上下文。矩阵具有不同的行数,并且缺少一些矩阵(列表元素为NULL)。我能够使用purrr包中的map和at_depth从层次结构中提取相关数据并保留信息作为列表名称,但我找不到一种干净的方法将其转换为单个data.frame。
我试图使用purrr ::: transpose作为示例here,我尝试使用tidyr ::: unfst如图所示here,但我认为他们想要的结果和输入与我的不同之处在于它们不适用。不同的行名称和/或缺少的矩阵似乎有太多问题。我也是purrr包的新手,所以我可能会在这里找到一些简单的东西。
这是我自己的尝试,它产生了几乎所需的结果,我想我可以稍微修改一下以删除for循环,然后再使用另一层' apply'功能,但我怀疑有更好的方法来解决这个问题。
#Download data
json <- getURL("http://maps2.dnr.state.mn.us/cgi-bin/lakefinder/detail.cgi?type=lake_survey&id=69070100")
#Surveys are the relevant data
data.listed <- fromJSON(json, simplifyDataFrame=F)
surveys <- data.listed$result$surveys
#Get list of lists of matrices - fish size count data
fcounts <- map(surveys, "lengths") %>%
at_depth(2, "fishCount") %>%
at_depth(2, data.frame) # side note: is this a good way to the inner matrices to data.frames?
#top-level - list - surveys
#2nd-level - list - species in each survey
#3rd-level - data.frame - X1: measured_size, X2: counts
#use survey IDs as names for top level list
#just as species are used as names for 2nd level lists
names(fcounts) <- sapply(surveys, function(s) {return(s$surveyID)})
#This produces nearly the correct result
for (i in 1:length(fcounts)){
surv.id <- names(fcounts)[[i]]
if (length(fcounts[[i]]) > 0) {
listed.withSpecies <- lapply(names(fcounts[[i]]), function(species) cbind(fcounts[[i]][[species]], species))
surv.fishCounts <- do.call(rbind, listed.withSpecies)
colnames(surv.fishCounts) <- c("size", "count", "species")
surv.fishCounts$survey.ID <- surv.id
print(surv.fishCounts)
}
}
答案 0 :(得分:2)
这是将长度计数的嵌套数据帧放入大数据帧的一种方法:
library(httr)
library(tidyverse)
res <- GET("http://maps2.dnr.state.mn.us/cgi-bin/lakefinder/detail.cgi",
query = list(type="lake_survey", id="69070100"))
content(res, as="text") %>%
jsonlite::fromJSON(simplifyDataFrame = FALSE, flatten=FALSE) -> x
x$result$surveys %>%
map_df(~{
tmp_df <- flatten_df(.x[c("surveyDate", "surveyID", "surveyType", "surveySubType")])
lens <- .x$lengths
if (length(lens) > 0) {
fish <- names(lens)
data_frame(fish,
max_length = map_dbl(lens, "maximum_length"),
min_length = map_dbl(lens, "minimum_length"),
lens = map(lens, "fishCount") %>%
map(~set_names(as_data_frame(.), c("catch_len", "ct")))) %>%
mutate(surveyDate = tmp_df$surveyDate,
surveyType = tmp_df$surveyType,
surveySubType = tmp_df$surveySubType,
surveyID = tmp_df$surveyID) -> tmp_df
}
tmp_df
}) -> lengths_df
glimpse(lengths_df)
## Observations: 21
## Variables: 8
## $ surveyDate <chr> "1988-07-19", "1995-07-17", "1995-07-17", "1995-07-17", "1995-07-17", "1995-07-17", "1995-07-...
## $ surveyID <chr> "107278", "107539", "107539", "107539", "107539", "107539", "107539", "107539", "107539", "10...
## $ surveyType <chr> "Standard Survey", "Standard Survey", "Standard Survey", "Standard Survey", "Standard Survey"...
## $ surveySubType <chr> "Population Assessment", "Re-Survey", "Re-Survey", "Re-Survey", "Re-Survey", "Re-Survey", "Re...
## $ fish <chr> NA, "PMK", "BLB", "LMB", "YEP", "BLG", "WTS", "WAE", "NOP", "GSF", "BLC", NA, "HSF", "PMK", "...
## $ max_length <dbl> NA, 6, 12, 16, 6, 7, 18, 18, 36, 4, 10, NA, 8, 7, 12, 12, 6, 8, 23, 38, 12
## $ min_length <dbl> NA, 3, 10, 1, 3, 3, 16, 16, 6, 4, 4, NA, 7, 4, 10, 12, 5, 3, 12, 9, 7
## $ lens <list> [NULL, <c("3", "6"), c("1", "3")>, <c("10", "11", "12"), c("1", "1", "4")>, <c("1", "16", "2...
print(lengths_df, n=nrow(lengths_df))
## # A tibble: 21 × 8
## surveyDate surveyID surveyType surveySubType fish max_length min_length lens
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <list>
## 1 1988-07-19 107278 Standard Survey Population Assessment <NA> NA NA <NULL>
## 2 1995-07-17 107539 Standard Survey Re-Survey PMK 6 3 <tibble [2 × 2]>
## 3 1995-07-17 107539 Standard Survey Re-Survey BLB 12 10 <tibble [3 × 2]>
## 4 1995-07-17 107539 Standard Survey Re-Survey LMB 16 1 <tibble [6 × 2]>
## 5 1995-07-17 107539 Standard Survey Re-Survey YEP 6 3 <tibble [3 × 2]>
## 6 1995-07-17 107539 Standard Survey Re-Survey BLG 7 3 <tibble [5 × 2]>
## 7 1995-07-17 107539 Standard Survey Re-Survey WTS 18 16 <tibble [3 × 2]>
## 8 1995-07-17 107539 Standard Survey Re-Survey WAE 18 16 <tibble [2 × 2]>
## 9 1995-07-17 107539 Standard Survey Re-Survey NOP 36 6 <tibble [17 × 2]>
## 10 1995-07-17 107539 Standard Survey Re-Survey GSF 4 4 <tibble [1 × 2]>
## 11 1995-07-17 107539 Standard Survey Re-Survey BLC 10 4 <tibble [6 × 2]>
## 12 1992-07-24 107587 Standard Survey Re-Survey <NA> NA NA <NULL>
## 13 2005-07-11 107906 Standard Survey Population Assessment HSF 8 7 <tibble [2 × 2]>
## 14 2005-07-11 107906 Standard Survey Population Assessment PMK 7 4 <tibble [4 × 2]>
## 15 2005-07-11 107906 Standard Survey Population Assessment BLB 12 10 <tibble [3 × 2]>
## 16 2005-07-11 107906 Standard Survey Population Assessment LMB 12 12 <tibble [1 × 2]>
## 17 2005-07-11 107906 Standard Survey Population Assessment YEP 6 5 <tibble [2 × 2]>
## 18 2005-07-11 107906 Standard Survey Population Assessment BLG 8 3 <tibble [6 × 2]>
## 19 2005-07-11 107906 Standard Survey Population Assessment WAE 23 12 <tibble [8 × 2]>
## 20 2005-07-11 107906 Standard Survey Population Assessment NOP 38 9 <tibble [20 × 2]>
## 21 2005-07-11 107906 Standard Survey Population Assessment BLC 12 7 <tibble [4 × 2]>
您可以通过以下方式扩展嵌套捕获观察:
filter(lengths_df, !map_lgl(lens, is.null)) %>%
unnest(lens)
## # A tibble: 98 × 9
## surveyDate surveyID surveyType surveySubType fish max_length min_length catch_len ct
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <int> <int>
## 1 1995-07-17 107539 Standard Survey Re-Survey PMK 6 3 3 1
## 2 1995-07-17 107539 Standard Survey Re-Survey PMK 6 3 6 3
## 3 1995-07-17 107539 Standard Survey Re-Survey BLB 12 10 10 1
## 4 1995-07-17 107539 Standard Survey Re-Survey BLB 12 10 11 1
## 5 1995-07-17 107539 Standard Survey Re-Survey BLB 12 10 12 4
## 6 1995-07-17 107539 Standard Survey Re-Survey LMB 16 1 1 1
## 7 1995-07-17 107539 Standard Survey Re-Survey LMB 16 1 16 1
## 8 1995-07-17 107539 Standard Survey Re-Survey LMB 16 1 2 6
## 9 1995-07-17 107539 Standard Survey Re-Survey LMB 16 1 4 4
## 10 1995-07-17 107539 Standard Survey Re-Survey LMB 16 1 5 2
## # ... with 88 more rows