Question

我正在编写一小段代码，该代码需要两个不同的数据帧并比较每个数据帧中的内容。使用lapply时如何访问数据框中的各个行？

我尝试使用嵌套的for循环访问索引。但是，数据集非常大，执行时间很长。

for (i in 1:20){
for(j in 1:nrow(keyFile)){
        consolFile[i,46] <- 
ifelse(str_detect(toString(consolFile[i,47]),toString(keyFile[j,1])),append(toString(consolFile[i,46]),paste(";",toString(keyFile[j,1]))),append(toString(consolFile[i,46]),""))

  }

}

我基本上是想在匹配的内容后面加上分号和下一个匹配的元素。我听说lapply / apply是实现此目的的更快方法。但是我无法访问各个行以追加数据。

如果数据帧1具有

1.abc
2.def
3.bdc

如果数据框2具有

1.a
2.b

输出应为

1.a;b
2.
3.b

Answer 1

不知道我是否可以详尽地解决附加结果的问题，但这就是我得到的：

library(tidyverse)

df1 <- data.frame(words = c("abc", "def", "bdc"), stringsAsFactors = F) 
df2 <- data.frame(var1 = c("a", "b"), stringsAsFactors = F) 



map(1:nrow(df2), function(x) str_extract(df1[,1], df2[x,1])) %>%
  pmap(paste, sep = ";") %>%
  map(str_remove_all, "NA;|;NA|NA") %>%
  do.call("rbind", .) %>%
  cbind(df1, "matches" = .)

  words matches
1   abc     a;b
2   def        
3   bdc       b

purrr::map``` is almost the same as lapply（）`，在这种情况下您可以交换它。

Answer 2

这使用str_extract_all以及您可以使用|折叠搜索向量的事实（即搜索'a|b'）。我留下了一些额外的列，但您可以轻松获得所需的输出。

# https://stackoverflow.com/questions/56251378/how-to-recursively-add-text-to-a-data-frame-output-from-lapply

library(dplyr)
library(stringr)

tib <- tibble(x = c('abc', 'def', 'bdc'))

match_vector <- c('a','b')
paste(match_vector, collapse = '|')

tib%>%
  rowwise()%>%
  mutate(matches = str_extract_all(x, 'a|b') #could use match_vector and collapse here
         ,matches2 = paste(matches, collapse = ';'))%>%
  ungroup()

# A tibble: 3 x 3
  x     matches   matches2
  <chr> <list>    <chr>   
1 abc   <chr [2]> a;b     
2 def   <chr [0]> ""      
3 bdc   <chr [1]> b

如何将文本递归添加到lapply输出的数据帧中？

2 个答案: