R-在收集功能中使用列索引

时间:2019-12-25 03:50:26

标签: r tidyr

我正在尝试使用collect函数来合并多行并延长我的宽数据。下面的示例数据:

User ID    Book 1    Book 1_YN   Book 2   Book 2_YN   Book 3   Book3_YN   Book 4   Book 4_YN
1            ABC          Y      XYZ          N         LMN        Y 
2            XYZ          Y      DEF          Y      
3            ABC          N      XYZ          Y         TUV        N       HIJ     Y 

理想情况下,我希望数据看起来像下表,这样我就可以汇总有关图书的信息:

User ID    Book_Num    Book   Book_YN
1           Book 1     ABC      Y
1           Book 2     XYZ      N
1           Book 3     LMN      Y
2           Book 1     XYZ      Y
2           Book 2     DEF      Y
3           Book 1     ABC      N
3           Book 2     XYZ      Y
3           Book 3     TUV      Y
3           Book 4     HIJ      Y

当我尝试在collect函数中使用列索引时...

data_clean <- gather(data, Book_Num, Book, data[c(2,4,6,8)]

我收到以下错误: “错误:data[c(2,4,6,8)]必须求出列的位置或名称,而不是列表的值”

任何人都知道该错误的含义和/或是否有更好的方法来处理此任务?

*编辑后将图像更改为表格

1 个答案:

答案 0 :(得分:1)

一个选项是melt中的data.table

library(data.table)
melt(setDT(df1), measure = patterns("^Book \\d+$", "^Book \\d+_YN$"), na.rm = TRUE,
     value.name = c("Book", "Book_YN"), variable.name = "Book_Num")[, 
      Book_Num := paste("Book", Book_Num)][order(`User ID`)]
#   User ID Book_Num Book Book_YN
#1:       1   Book 1  ABC       Y
#2:       1   Book 2  XYZ       N
#3:       1   Book 3  LMN       Y
#4:       2   Book 1  XYZ       Y
#5:       2   Book 2  DEF       Y
#6:       3   Book 1  ABC       N
#7:       3   Book 2  XYZ       Y
#8:       3   Book 3  TUV       N
#9:       3   Book 4  HIJ       Y

或使用pivot_longer

library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
    rename_at(-1, ~ str_replace(., ' (\\d+)_YN', '_YN \\1')) %>%
    pivot_longer(cols = -`User ID`, names_to = c(".value", "Book_Num"),
      names_sep=" ", values_drop_na = TRUE) %>% 
    mutate(Book_Num = str_c('Book ', Book_Num))
# A tibble: 9 x 4
#  `User ID` Book_Num Book  Book_YN
#      <int> <chr>    <chr> <chr>  
#1         1 Book 1   ABC   Y      
#2         1 Book 2   XYZ   N      
#3         1 Book 3   LMN   Y      
#4         2 Book 1   XYZ   Y      
#5         2 Book 2   DEF   Y      
#6         3 Book 1   ABC   N      
#7         3 Book 2   XYZ   Y      
#8         3 Book 3   TUV   N      
#9         3 Book 4   HIJ   Y      

数据

df1 <- structure(list(`User ID` = 1:3, `Book 1` = c("ABC", "XYZ", "ABC"
), `Book 1_YN` = c("Y", "Y", "N"), `Book 2` = c("XYZ", "DEF", 
"XYZ"), `Book 2_YN` = c("N", "Y", "Y"), `Book 3` = c("LMN", NA, 
"TUV"), `Book 3_YN` = c("Y", NA, "N"), `Book 4` = c(NA, NA, "HIJ"
), `Book 4_YN` = c(NA, NA, "Y")), class = "data.frame", row.names = c(NA, 
-3L))