Question

我想使用R根据某些值合并某些文件参数。我使用以下csv文件

ls table.csv  
filename,type,rep,category,param
file1,1,4,A,1
file2,3,1,B,1
file3,2,1,A,2
file4,1,1,C,3
file5,1,1,C,2
file6,2,2,D,1
file7,3,1,C,2
file8,3,1,B,3
file9,3,1,B,3
file10,1,4,A,1
file11,1,1,B,1

ta <- readr::read_csv("table.csv")
Parsed with column specification:
cols(
filename = col_character(),
type = col_integer(),
rep = col_integer(),
category = col_character(),
param = col_integer()
)

我想合并具有相同值的文件 ta $ type，ta $ rep和ta $ category（ta $ param无关紧要）。

所以我会合并：file1，file10 [1,4，A] file2，file8，file9 [3,1，B] file4，file5 [1,1，C] file3 [2,1，A]，file6 [2,2，D]，file7 [3,1，C]和 file11 [1,1，B]不会与任何其他文件合并。

有没有人有任何想法如何做到这一点？谢谢！

Answer 1

选项是使用dplyr::group_by列（type, rep, category），然后summarise使用带有paste0参数的collapse="+"。解决方案如下：

library(readr)

library(dplyr)

ta <- readr::read_csv("table.csv")

ta %>% group_by(type, rep, category) %>%
  summarise(file = paste0(filename, collapse="+"))


# # A tibble: 7 x 4
# # Groups: type, rep [?]
#    type   rep category file             
#   <int> <int> <chr>    <chr>            
# 1     1     1 B        file11           
# 2     1     1 C        file4+file5      
# 3     1     4 A        file1+file10     
# 4     2     1 A        file3            
# 5     2     2 D        file6            
# 6     3     1 B        file2+file8+file9
# 7     3     1 C        file7

通过比较列来对数据帧进行子集化

1 个答案: