生成多列以将数据整理到R

时间:2018-10-13 20:41:11

标签: r dplyr tidyverse

我有一个数据库,其中包括姓名,代码和房间,如下所示:

Name1	Code1	R1
A A	12	1
A B	13	2
A C	15	5
A B	8	4
A C	13	2
A D	17	1
A B	16	7

我想为重复的名称生成列,如下所示:

Name1	Code1	R1	Name2	Code2 	R2	Name3	Cod3	R3
A A	12	1						
A B	13	2						
A C	15	5						
A B	8	4	A B	8	4			
A C	13	2	A C	13	2			
A D	17	1						
A B	16	7				A B	16	7

我已经在Google上寻找解决方案,但找不到或可能错过了一些东西。您能帮我吗?一些名称(Name1)已经重复了5次,但我没有添加它。所以我有Name2 Code2 R2; Name3,Code3,R3 ...

1 个答案:

答案 0 :(得分:1)

样本数据:

df <- read.table(stringsAsFactors = F, header = T,  text = "
Name1a Name1b   Code1   R1
1 A A   12  1
2 A B   13  2
3 A C   15  5
4 A B   8   4
5 A C   13  2
6 A D   17  1
7 A B   16  7") %>%
  tidyr::unite(Name1, Name1a, Name1b)

编辑:原始答案是打包格式,但OP希望对所有行重复第一组列,并在它们最初出现的行中显示第二和第三次出现。

这是使用dplyrtidyr的方法。

# Keep track of original rows, label repeats, and make it long format
df_order <- df %>% 
  mutate(orig_row = row_number()) %>%
  group_by(Name1) %>% mutate(repeat_no = row_number()) %>% ungroup() %>%
  gather(col_type, value, Code1:R1)

# Make one copy of all the rows to keep in first column
df_ones <- df_order %>%
  mutate(repeat_no = 1) %>%
  unite(col_rpt, repeat_no, col_type)

# Get the repeated rows to add on
df_repeats <- df_order %>%
  filter(repeat_no > 1) %>%
  unite(col_rpt, repeat_no, col_type)

# Combine the two and spread out
output <- df_ones %>%
  bind_rows(df_repeats) %>%
  spread(col_rpt, value) %>%
  arrange(orig_row) %>%
  select(-orig_row)

输出:

> output
# A tibble: 7 x 7
  Name1 `1_Code1` `1_R1` `2_Code1` `2_R1` `3_Code1` `3_R1`
  <chr>     <int>  <int>     <int>  <int>     <int>  <int>
1 A_A          12      1        NA     NA        NA     NA
2 A_B          13      2        NA     NA        NA     NA
3 A_C          15      5        NA     NA        NA     NA
4 A_B           8      4         8      4        NA     NA
5 A_C          13      2        13      2        NA     NA
6 A_D          17      1        NA     NA        NA     NA
7 A_B          16      7        NA     NA        16      7