将嵌套列表与数据框中的列进行比较

时间:2018-07-02 12:53:17

标签: r

我想根据此比较来比较嵌套列表中的最大值(从列表中的文本中提取值)与嵌套列表中另一列和gsub元素中的数字(未嵌套):P

输入:

structure(list(ExtentNumber = list("3", 1, "2", 
    "4", "1"), BiopsyType = list("2--Biopsy site: Stomach Number of biopsies: 2", 
    c("4--Biopsy site: D2 - 2nd part of duodenum Number of biopsies: 7", 
    "2--Biopsy site: Stomach Number of biopsies: 9", "Biopsy site: None", 
    "3--Biopsy site: Duodenal bulb Number of biopsies: 1"), c("1--Biopsy site: Oesophagus Number of biopsies: 10", 
    "2--Biopsy site: Stomach Number of biopsies: 6"), "3--Biopsy site: Duodenal bulb Number of biopsies: 4", 
    c("1--Biopsy site: Oesophagus Number of biopsies: 6", "4--Biopsy site: D2 - 2nd part of duodenum Number of biopsies: 9"
    ))), .Names = c("ExtentNumber", "BiopsyType"), row.names = c(NA, 
5L), class = "data.frame")

我最初尝试过:

lapply(OGDProcedureDf$BiopsyType, function(p)
  ifelse(max(as.numeric(str_match(p,"^(\\d)--")),na.rm=T)>OGDProcedureDf$ExtentNumber,gsub("*.","",p),p)
  )

但意识到我正在与ExtentNumber中的所有数字进行比较 然后,我尝试将其包装在一个apply函数中,如下所示:

apply(OGDProcedureDf,1,function(x)  lapply(OGDProcedureDf$BiopsyType, function(p)
  ifelse(max(as.numeric(str_match(p,"^(\\d)--")),na.rm=T)>OGDProcedureDf$ExtentNumber,gsub("*.","",p),p)
  ))

但是我得到了错误:

Error in match.fun(FUN) : argument "FUN" is missing, with no default

因此,基本上,如何基于未嵌套的列值来查找和替换嵌套列表中的元素?

预期结果:

structure(list(ExtentNumber = list("3", 1, "2", "4", "1"), BiopsyType = list("2--Biopsy site: Stomach Number of biopsies: 2", 
                                                                c("", "", ""), c("1--Biopsy site: Oesophagus Number of biopsies: 10","")
                                                                , "3--Biopsy site: Duodenal bulb Number of biopsies: 4", 
                                                                c("1--Biopsy site: Oesophagus Number of biopsies: 6", ""
                                                                ))), .Names = c("ExtentNumber", "BiopsyType"), row.names = c(NA, 5L), class = "data.frame")

2 个答案:

答案 0 :(得分:1)

这可能不是最有效的方法,但这是我的评论的后续内容,

l1 <- Map(function(x, y) replace(x > y, is.na(x > y), FALSE) , 
                                                df$ExtentNumber, 
                                                lapply(df$BiopsyType, function(i) 
                                                      as.numeric(gsub('^([0-9]+)--.*$', '\\1', i))))

mapply(function(x, y) paste0(x[y], collapse = ', '), 
                                 lapply(df$BiopsyType, function(i) unlist(strsplit(i, ', '))), l1)

#[1] "2--Biopsy site: Stomach Number of biopsies: 2"   ""   "1--Biopsy site: Oesophagus Number of biopsies: 10"   "3--Biopsy site: Duodenal bulb Number of biopsies: 4"
#[5] ""

答案 1 :(得分:1)

 Map(function(x,y)y[as.numeric(x)>=as.numeric(sub("^(\\d+).*$|.*","\\1",y))],
                        dat$ExtentNumber,dat$BiopsyType)
[[1]]
[1] "2--Biopsy site: Stomach Number of biopsies: 2"

[[2]]
[1] NA

[[3]]
[1] "1--Biopsy site: Oesophagus Number of biopsies: 10" "2--Biopsy site: Stomach Number of biopsies: 6"    

[[4]]
[1] "3--Biopsy site: Duodenal bulb Number of biopsies: 4"

[[5]]
[1] "1--Biopsy site: Oesophagus Number of biopsies: 6"