我有一个数据库,其中包括姓名,代码和房间,如下所示:
Name1 Code1 R1
A A 12 1
A B 13 2
A C 15 5
A B 8 4
A C 13 2
A D 17 1
A B 16 7
我想为重复的名称生成列,如下所示:
Name1 Code1 R1 Name2 Code2 R2 Name3 Cod3 R3
A A 12 1
A B 13 2
A C 15 5
A B 8 4 A B 8 4
A C 13 2 A C 13 2
A D 17 1
A B 16 7 A B 16 7
我已经在Google上寻找解决方案,但找不到或可能错过了一些东西。您能帮我吗?一些名称(Name1)已经重复了5次,但我没有添加它。所以我有Name2 Code2 R2; Name3,Code3,R3 ...
答案 0 :(得分:1)
样本数据:
df <- read.table(stringsAsFactors = F, header = T, text = "
Name1a Name1b Code1 R1
1 A A 12 1
2 A B 13 2
3 A C 15 5
4 A B 8 4
5 A C 13 2
6 A D 17 1
7 A B 16 7") %>%
tidyr::unite(Name1, Name1a, Name1b)
编辑:原始答案是打包格式,但OP希望对所有行重复第一组列,并在它们最初出现的行中显示第二和第三次出现。
这是使用dplyr
和tidyr
的方法。
# Keep track of original rows, label repeats, and make it long format
df_order <- df %>%
mutate(orig_row = row_number()) %>%
group_by(Name1) %>% mutate(repeat_no = row_number()) %>% ungroup() %>%
gather(col_type, value, Code1:R1)
# Make one copy of all the rows to keep in first column
df_ones <- df_order %>%
mutate(repeat_no = 1) %>%
unite(col_rpt, repeat_no, col_type)
# Get the repeated rows to add on
df_repeats <- df_order %>%
filter(repeat_no > 1) %>%
unite(col_rpt, repeat_no, col_type)
# Combine the two and spread out
output <- df_ones %>%
bind_rows(df_repeats) %>%
spread(col_rpt, value) %>%
arrange(orig_row) %>%
select(-orig_row)
输出:
> output
# A tibble: 7 x 7
Name1 `1_Code1` `1_R1` `2_Code1` `2_R1` `3_Code1` `3_R1`
<chr> <int> <int> <int> <int> <int> <int>
1 A_A 12 1 NA NA NA NA
2 A_B 13 2 NA NA NA NA
3 A_C 15 5 NA NA NA NA
4 A_B 8 4 8 4 NA NA
5 A_C 13 2 13 2 NA NA
6 A_D 17 1 NA NA NA NA
7 A_B 16 7 NA NA 16 7