使用apply函数迭代两个列表

时间:2017-09-25 08:38:01

标签: r list lapply mapply

我遇到一个问题,我有一个数据框列表,其中数据框的每一列在第一行中都有一个名称,在列中的某些位置有x-s。如果有x,则第一行中的名称将被视为已选中。 在现实世界的问题中,我读了一个包含许多工作表的xlsx文件,其中每个工作表包含一个大矩阵:每列在第一行中有一个名称,而在一个有点稀疏的矩阵中有许多x-s。每个工作表成为数据框列表中的数据框。行名称包含一个与查找相关的标识符,但不包含此处所述的问题。

data1 <- data.frame(Col1 = c("Mark", "x", "", "x", "", ""),
                    Col2 = c("Paul", "", "", "", "x", ""),
                    Col3 = c("Jane", "", "", "", "", ""),
                    Col4 = c("Mary", "x", "x", "x", "", ""),
                    Col5 = c("Peter", "x", "x", "x", "", ""),
                    stringsAsFactors = FALSE)

data2 <- data.frame(Col1 = c("Mark", "x", "x", "", "", ""),
                    Col2 = c("Paul", "", "", "", "", ""),
                    Col3 = c("Jane", "", "", "", "", ""),
                    Col4 = c("Mary", "x", "", "x", "", ""),
                    Col5 = c("Peter", "x", "x", "", "", ""),
                             stringsAsFactors = FALSE)

data <- list(data1 = data1, data2 = data2)

列表中的每个数据框都具有以下结构(为方便起见,显示为矩阵),其中列表中的每个数据帧的名称相同。只有x-s是不同的:

> as.matrix(data1)
     Col1   Col2   Col3   Col4   Col5   
[1,] "Mark" "Paul" "Jane" "Mary" "Peter"
[2,] "x"    ""     ""     "x"    "x"    
[3,] ""     ""     ""     "x"    "x"    
[4,] "x"    ""     ""     "x"    "x"    
[5,] ""     "x"    ""     ""     ""     
[6,] ""     ""     ""     ""     ""  

我想在列表中的每个数据框中添加一列(“审批者”),如果列中有“x”,则列中第1行的名称串联,如下所示:

     Col1   Col2   Col3   Col4   Col5    Approvers          
[1,] "Mark" "Paul" "Jane" "Mary" "Peter" ""                 
[2,] "x"    ""     ""     "x"    "x"     "Mark; Mary; Peter"
[3,] ""     ""     ""     "x"    "x"     "Mary; Peter"      
[4,] "x"    ""     ""     "x"    "x"     "Mark; Mary; Peter"
[5,] ""     "x"    ""     ""     ""      "Paul"             
[6,] ""     ""     ""     ""     ""      ""   

目前我分两步解决这个问题:

  1. 我创建另一个列表,其中包含每个x
  2. 的列位置
  3. 在嵌套的for循环中,我查找第一行中的所有名称并将它们连接起来。
  4. 代码如下:

    position <- lapply(data, function(x) apply(x, 1, function(y) which(y %in% "x")))
    position <- lapply(position, function(x) lapply(x, function(y) {if (length(y) == 0L) return(0) else return(y)})) # remove int(0) and replace with 0
    position <- lapply(position, function(x) lapply(x, function(x) paste(x, collapse = ","))) # flatten second level list into string
    
    
    for (i in 1:length(data)) {
      for (j in 1:nrow(data[[i]])) {
        if (as.numeric(unlist(strsplit(position[[i]][[j]], ",")))[[1]] == 0) {
          data[[i]][j, "Approvers"] <- ""
        } else {
          data[[i]][j, "Approvers"] <- paste(data[[i]][1, as.numeric(unlist(strsplit(position[[i]][[j]], ",")))], collapse = "; ")
        }
      }
    }
    

    对我来说,这是笨拙的,我想通过同时循环遍历这两个列表来使用lapply和mapply这样做,但我无法弄清楚如何做到这一点。此外,创建位置对象并将x-s的列索引折叠为字符串并将它们分离到循环中过于复杂。

2 个答案:

答案 0 :(得分:1)

我们可以使用lapply循环list,然后将apply循环遍及行,paste将第一行的元素放在一起,其值为{{1 }}:

x

注意:似乎数据集的res <- lapply(data, function(x) { x$Approvers <- apply(x, 1, FUN = function(y) paste(x[1,][y =="x"], collapse=";")) x}) res #$data1 # Col1 Col2 Col3 Col4 Col5 Approvers #1 Mark Paul Jane Mary Peter #2 x x x Mark;Mary;Peter #3 x x Mary;Peter #4 x x x Mark;Mary;Peter #5 x Paul #6 #$data2 # Col1 Col2 Col3 Col4 Col5 Approvers #1 Mark Paul Jane Mary Peter #2 x x x Mark;Mary;Peter #3 x x Mark;Peter #4 x Mary #5 #6 应该是“Mark&#39;”&#39; Paul&#39;而不是&#39; Col1&#39;,&#39; Col2&#39;,..

答案 1 :(得分:0)

作为替代方案,整理这些数据可能是值得的,因此更容易操作和推理。此外,如果返回完整的NAs行,则可能并不总是需要您想要的输出。此处的代码重新构建您的数据框,以便列名成为人名。然后它重新整形数据,以便有两列,name和原始数据帧(row_ix)中的行索引,其中&#34; x&#34;出现在该名称列中。然后我将NAs,组row_ix放下并将名称粘贴在一起,返回更整洁的数据帧。

我很欣赏这一点,但是以更整洁的方式存储数据可以为您节省长期的问题。

library(dplyr)
library(purrr)
library(tidyr)
library(magrittr)

data %>% 
  map(function(x) #map function to all dataframes in list
  x %>% set_colnames(.[1, ]) %>% # set column names equal to first row values
  dmap(~ifelse(. == "x", seq_along(.), NA)) %>% # check for "x" in all rows of all columns
  gather(name, row_ix) %>% # reshape from wide to long, call new columns name and row_ix
  drop_na() %>% # drop NAs in the dataframe
  group_by(row_ix) %>% # group by row index
  summarise(approvers = paste0(name, collapse = ";")) # concatenate names from each group
  )

$data1
# A tibble: 4 × 2
  row_ix       approvers
   <int>           <chr>
1      2 Mark;Mary;Peter
2      3      Mary;Peter
3      4 Mark;Mary;Peter
4      5            Paul

$data2
# A tibble: 3 × 2
  row_ix       approvers
   <int>           <chr>
1      2 Mark;Mary;Peter
2      3      Mark;Peter
3      4            Mary