我已经获得了一个电话号码列表,我希望按姓名分组,并将它们从长格式带到宽格式,电话号码填写在列中
Name Phone_Number John Doe 0123456 John Doe 0123457 John Doe 0123458 Jim Doe 0123459 Jim Doe 0123450 Jane Doe 0123451 Jill Doe 0123457 Name Phone_Number1 Phone_Number2 Phone_Number3 John Doe 0123456 0123457 0123458 Jim Doe 0123459 0123450 NA Jane Doe 0123451 NA NA Jill Doe NA NA NA
library(dplyr)
library(tidyr)
library(data.table)
df <- data.frame(Name = c("John Doe", "John Doe", "John Doe", "Jim Doe", "Jim Doe", "Jane Doe", "Jill Doe" ),
Phone_Number = c("0123456", "0123457","0123458", "0123459", "0123450","0123451", NA))
df1 <- data.frame(Name = c("John Doe","Jim Doe", "Jane Doe", "Jill Doe" ),
Phone_Number1 = c("0123456", "0123459", "0123451", NA),
Phone_Number2 = c("0123457", "0123450", NA, NA),
Phone_Number3 = c("0123458", NA, NA, NA))
我尝试了一系列排列,但我所做错的只是不点击。我猜测如何正确指定它们的键/值对。我最接近的是以下代码:
tidyr ::扩散
df %>%
group_by(Name) %>%
mutate(id = row_number()) %>%
spread(Name, Phone_Number) %>%
select(-id)
data.table :: dcast
df%>%
dcast(Name + Phone_Number ~ Phone_Number, value.var = "Phone_Number")
答案 0 :(得分:2)
您不想添加行号(整个数据的索引),而是使用辅助函数grouped_df
添加组索引,该函数表示{的每个组中的观察数。 {1}}。传播应该顺利进行......
df %>% group_by(Name) %>%
mutate(group_index = 1:n() %>% paste0("phone_", .)) %>%
spread(group_index, Phone_Number)
# A tibble: 4 x 4
# Groups: Name [4]
Name phone_1 phone_2 phone_3
<fctr> <fctr> <fctr> <fctr>
1 Jane Doe 0123451 <NA> <NA>
2 Jill Doe <NA> <NA> <NA>
3 Jim Doe 0123459 0123450 <NA>
4 John Doe 0123456 0123457 0123458
答案 1 :(得分:2)
为了完整起见,rowid()
函数具有一个prefix
参数,它提供了一个简洁的解决方案:
library(data.table)
dcast(setDT(df), Name ~ rowid(Name, prefix = "Phone_Number"))
Name Phone_Number1 Phone_Number2 Phone_Number3 1: Jane Doe 0123451 <NA> <NA> 2: Jill Doe <NA> <NA> <NA> 3: Jim Doe 0123459 0123450 <NA> 4: John Doe 0123456 0123457 0123458
答案 2 :(得分:1)
按rowid
创建Name
,这就足够了
library(dplyr)
library(tidyr)
library(data.table)
df <- setDT(data.frame(Name = c("John Doe", "John Doe", "John Doe", "Jim Doe", "Jim Doe", "Jane Doe", "Jill Doe" ),
Phone_Number = c("0123456", "0123457","0123458", "0123459", "0123450","0123451", NA)))
df1 <- data.frame(Name = c("John Doe","Jim Doe", "Jane Doe", "Jill Doe" ),
Phone_Number1 = c("0123456", "0123459", "0123451", NA),
Phone_Number2 = c("0123457", "0123450", NA, NA),
Phone_Number3 = c("0123458", NA, NA, NA))
df[, rowid := rowid(Name)]
dcast.data.table(df, Name ~ rowid, value.var = "Phone_Number")
Name 1 2 3
1: Jane Doe 0123451 NA NA
2: Jill Doe NA NA NA
3: Jim Doe 0123459 0123450 NA
4: John Doe 0123456 0123457 0123458
正如评论中所指出的,不需要为任务创建rowdi
变量。您可以执行以下操作,更简单,更整洁的代码
df <- setDT(data.frame(Name = c("John Doe", "John Doe", "John Doe", "Jim Doe", "Jim Doe", "Jane Doe", "Jill Doe" ),
Phone_Number = c("0123456", "0123457","0123458", "0123459", "0123450","0123451", NA)))
dcast.data.table(df, Name ~ paste0("Phone_Number", rowid(Name)),
value.var = "Phone_Number")
Name Phone_Number1 Phone_Number2 Phone_Number3
1: Jane Doe 0123451 NA NA
2: Jill Doe NA NA NA
3: Jim Doe 0123459 0123450 NA
4: John Doe 0123456 0123457 0123458