我正在尝试使用collect函数来合并多行并延长我的宽数据。下面的示例数据:
User ID Book 1 Book 1_YN Book 2 Book 2_YN Book 3 Book3_YN Book 4 Book 4_YN 1 ABC Y XYZ N LMN Y 2 XYZ Y DEF Y 3 ABC N XYZ Y TUV N HIJ Y
理想情况下,我希望数据看起来像下表,这样我就可以汇总有关图书的信息:
User ID Book_Num Book Book_YN 1 Book 1 ABC Y 1 Book 2 XYZ N 1 Book 3 LMN Y 2 Book 1 XYZ Y 2 Book 2 DEF Y 3 Book 1 ABC N 3 Book 2 XYZ Y 3 Book 3 TUV Y 3 Book 4 HIJ Y
当我尝试在collect函数中使用列索引时...
data_clean <- gather(data, Book_Num, Book, data[c(2,4,6,8)]
我收到以下错误:
“错误:data[c(2,4,6,8)]
必须求出列的位置或名称,而不是列表的值”
任何人都知道该错误的含义和/或是否有更好的方法来处理此任务?
*编辑后将图像更改为表格
答案 0 :(得分:1)
一个选项是melt
中的data.table
library(data.table)
melt(setDT(df1), measure = patterns("^Book \\d+$", "^Book \\d+_YN$"), na.rm = TRUE,
value.name = c("Book", "Book_YN"), variable.name = "Book_Num")[,
Book_Num := paste("Book", Book_Num)][order(`User ID`)]
# User ID Book_Num Book Book_YN
#1: 1 Book 1 ABC Y
#2: 1 Book 2 XYZ N
#3: 1 Book 3 LMN Y
#4: 2 Book 1 XYZ Y
#5: 2 Book 2 DEF Y
#6: 3 Book 1 ABC N
#7: 3 Book 2 XYZ Y
#8: 3 Book 3 TUV N
#9: 3 Book 4 HIJ Y
或使用pivot_longer
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
rename_at(-1, ~ str_replace(., ' (\\d+)_YN', '_YN \\1')) %>%
pivot_longer(cols = -`User ID`, names_to = c(".value", "Book_Num"),
names_sep=" ", values_drop_na = TRUE) %>%
mutate(Book_Num = str_c('Book ', Book_Num))
# A tibble: 9 x 4
# `User ID` Book_Num Book Book_YN
# <int> <chr> <chr> <chr>
#1 1 Book 1 ABC Y
#2 1 Book 2 XYZ N
#3 1 Book 3 LMN Y
#4 2 Book 1 XYZ Y
#5 2 Book 2 DEF Y
#6 3 Book 1 ABC N
#7 3 Book 2 XYZ Y
#8 3 Book 3 TUV N
#9 3 Book 4 HIJ Y
df1 <- structure(list(`User ID` = 1:3, `Book 1` = c("ABC", "XYZ", "ABC"
), `Book 1_YN` = c("Y", "Y", "N"), `Book 2` = c("XYZ", "DEF",
"XYZ"), `Book 2_YN` = c("N", "Y", "Y"), `Book 3` = c("LMN", NA,
"TUV"), `Book 3_YN` = c("Y", NA, "N"), `Book 4` = c(NA, NA, "HIJ"
), `Book 4_YN` = c(NA, NA, "Y")), class = "data.frame", row.names = c(NA,
-3L))