我有一个问题,我已经工作了几天,但找不到合适的答案。
我有一个需要放入mongo数据库的列表。它看起来像这样:
listtest = list(
list(section_id = NULL, name = "Name1", slug = "slug1"),
list(section_id = NULL, name = 'Name2', slug = 'slug2'),
list(section_id = NULL, name = 'Name3', slug = 'slug3', categories =
list(
list(section_id = NULL, name = 'Name31', slug = 'slug31'),
list(section_id = NULL, name = 'Name32', slug = 'slug32', categories =
list(
list(section_id = NULL, name = 'Name321', slug = 'slug321'),
list(section_id = NULL, name = 'Name322', slug = 'slug322'),
list(section_id = NULL, name = 'Name323', slug = 'slug323')
)),
list(section_id = NULL, name = 'Name33', slug = 'slug33', categories =
list(
list(section_id = NULL, name = 'Name331', slug = 'slug331'),
list(section_id = NULL, name = 'Name332', slug = 'slug332'),
list(section_id = NULL, name = 'Name333', slug = 'slug333'),
list(section_id = NULL, name = 'Name334', slug = 'slug334'),
list(section_id = NULL, name = 'Name335', slug = 'slug335')
)),
list(section_id = NULL, name = 'Name34', slug = 'slug34'),
list(section_id = NULL, name = 'Name35', slug = 'slug35', categories =
list(
list(section_id = NULL, name = 'Name351', slug = 'slug351', categories =
list(
list(section_id = NULL, name = 'Name3511', slug = 'slug3511'),
list(section_id = NULL, name = 'Name3512', slug = 'slug3512'),
list(section_id = NULL, name = 'Name3513', slug = 'slug3513')
)
)
)
)
)
)
)
问题是我有一个带有section_ids的数据框,我希望根据名称或slug将其放入嵌套列表中。我设法做到了这一点,但是当数据帧中没有secion_id时,它仍然留下一些等于字符(0)的section_ids。如何删除section_id等于字符(0)的列表?我也可以将section_id更改为mongoDB中的object_id吗?或者只能在JSON中完成,而不是在列表中完成?
干杯。
答案 0 :(得分:2)
我不得不重新创建一个关键表以供参考。
unique(unlist(listtest, use.names = FALSE)) %>% {
data.frame(name_var = .[c(T,F)], slug_var = .[c(F,T)]) %>%
mutate(section_id = sample(4678:92456,length(name_var))) %>%
select(section_id, name_var, slug_var)
} -> key_table
供参考:
> head(key_table)
section_id name_var slug_var
1 78002 Name1 slug1
2 48508 Name2 slug2
3 16510 Name3 slug3
4 89004 Name31 slug31
5 55853 Name32 slug32
6 65886 Name321 slug321
因此,从密钥表中,第一个函数匹配list元素中的name变量,并对密钥表进行索引并返回section id。
fn <- function(l, pat = NULL){
key_table[l[['name']] == key_table[['name_var']],'section_id']
}
然后递归,遍历列表并在每次迭代时重复fn
调用,用关键表中匹配的id替换section_id名称槽。在现实世界中,我猜你需要在没有明显匹配的情况下应用NA
变量。但是这应该指向你,你可以根据需要进行调整:
递归函数:
L
是列表元素,因此在这种情况下listtest
f <- function(L){
if("name" %in% names(L)){
L[['section_id']] <- fn(l = L)
} else {
L <- L
}
if(is.list(L)){
lapply(L, f)
}else {
L
}
}
输出:
glimpse(toJSON(f(listtest), auto_unbox = TRUE, pretty = F))
Class 'json' chr "[{\"section_id\":16667,\"name\":\"Name1\",\"slug\":\"slug1\"},{\"section_id\":76003,\"name\":\"Name2\",\"slug\""| __truncated__
前言:我从上面对数据集进行了采样并创建了一个较小的key_tbl
来复制不匹配的id或名称模式的条件
key_tbl <- sample_frac(key_table, 0.3)
#' Handles the matching from the key table, and substituting of the indexed match from the table, or replaces the `NULL` with an `NA`
fn <- function(l, pat = NULL){
check <- match(l[['name']], key_tbl$name_var)
if(is.na(check)){
NA
}else {
key_tbl[check,'section_id']
}
}
递归迭代的最大问题是处理check if a condition is met, if it is, for each sub-iteration, check if the first condition is met, and evaluate additional conditions; if any are not met, do this..else..do this.. ; then step back through and find which slots didn't meet certain conditions, and drop those
以下是嵌套列表的条件:
#' Determine if a list and not a data.frame which is an array of lists
is_list <- function(x){
(!is.data.frame(x) && inherits(x, "list"))
}
#' Forcefull NULL detection
is_null <- function(x){
identical(x, eval(parse(text = typeof(NULL))))
}
#' Forecful NA detection
is_na <- function(x){
if(identical(is.na(x),logical(0)) || is.na(x)){
TRUE
}else {
FALSE
}
}
长度检查
#' Trick for determining empty objects that are still class-or-object oriented
#' and can throw an empty return
is.empty <- function(x){
if(is.list(x)){
chk <- length(x)
}else if(is.character(x)){
chk <- nchar(x)
}else if(is.data.frame(x)){
chk <- nrow(x)
}else {
chk <- 1
}
if(chk == 0){
return(TRUE)
}else{
return(FALSE)
}
}
#' Checks most/all the above
is.invalid <- function(x){
if(is_null(x)){
return(TRUE)
}else if(is_na(x)){
return(TRUE)
}else if(!length(x)){
return(TRUE)
}else if(is.empty(x)){
return(TRUE)
}else {
return(FALSE)
}
}
以上所有的包装:
#' Vectorized function to remove any items within a list that meet any of the above tests.
drop.invalids <- function(x){
x[!mapply(is.invalid,x)]
}
f <- function(L){
#' Since we're in a loop, we need to ensure that there is a 'name' variable
#' to match against
if('name' %in% names(L)){
#' We've now substituted the indexed id, or an NA
L[['section_id']] <- fn(l = L)
}else {
#' We didn't do a thing, since we're not in a nested iteration yet.
L <- L
}
#' Since this is nested, and each list is a named list..
#' Check if ANY of those items, in each sub-item, is NA.
if(any(mapply(is.na, L[!mapply(is.list, L)]))){
#' For each sub item, make sure that it has children, and
#' if so, get only those kids, since we're dropping the array
#' that has an NA value in it.
if(any(mapply(is.list, L))){
L <- L[!mapply(function(x)any(is_na(x)), L)]
}else {
L <- NA
}
}else {
L <- L
}
#' If it's a list, recursively do all the above,
#' but only return the good-stuff
if(is.list(L)){
drop.invalids(lapply(L,f))
}else {
drop.invalids(L)
}
}
> toJSON(f(listtest), pretty= TRUE, auto_unbox = TRUE)
[{"name":"Name3","slug":"slug3","categories":[{"name":"Name32","slug":"slug32","categories":[{"section_id":89623,"name":"Name322","slug":"slug322"}]},{"name":"Name33","slug":"slug33","categories":[{"section_id":55220,"name":"Name331","slug":"slug331"},{"section_id":44116,"name":"Name334","slug":"slug334"}]},{"section_id":42746,"name":"Name34","slug":"slug34"},{"section_id":60862,"name":"Name35","slug":"slug35","categories":[{"section_id":37357,"name":"Name351","slug":"slug351"}]}]}]