我有这个data.frame:
V1 V2
1 RAB27A RAD21
2 RAB27A STAT1
3 ITGA4 RAD21
4 PANK3 SIX5
5 PANK3 SREBF1
6 PANK3 USF1
我希望它看起来像这样:
V1 V2 V3 V4
1 RAB27A RAD21 STAT1
2 ITGA4 RAD21
3 PANK3 SIX5 SREBF1 USF1
我是初学者。请帮助。
答案 0 :(得分:1)
您可以使用aggregate
,toString
(或paste
/ paste0
)和cSplit
函数的组合来实现此目的包裹:
splitstackshape
给出:
library(splitstackshape)
newdata <- cSplit(aggregate(V2 ~ V1, mydf, toString), 'V2', sep=',', direction='wide')
或者,您可以结合使用> newdata
V1 V2_1 V2_2 V2_3
1: ITGA4 RAD21 NA NA
2: PANK3 SIX5 SREBF1 USF1
3: RAB27A RAD21 STAT1 NA
和dplyr
:
tidyr
给出:
library(dplyr)
library(tidyr)
newdf <- mydf %>%
group_by(V1) %>%
summarise(V2 = toString(V2)) %>%
separate(V2, paste0('V2_',1:3), sep = ',')
使用过的数据:
> newdf
Source: local data frame [3 x 4]
V1 V2_1 V2_2 V2_3
(fctr) (chr) (chr) (chr)
1 ITGA4 RAD21 NA NA
2 PANK3 SIX5 SREBF1 USF1
3 RAB27A RAD21 STAT1 NA
答案 1 :(得分:0)
以下是data.table
library(data.table)
setDT(df1)[, .(V2= toString(V2)), V1][, paste0("V", 2:4) :=tstrsplit(V2, ", ")][]
# V1 V2 V3 V4
#1: RAB27A RAD21 STAT1 NA
#2: ITGA4 RAD21 NA NA
#3: PANK3 SIX5 SREBF1 USF1
或者仅使用dcast
dcast(setDT(df1), V1~rowid(V1, prefix = "V"), value.var="V2")
# V1 V1 V2 V3
#1: ITGA4 RAD21 NA NA
#2: PANK3 SIX5 SREBF1 USF1
#3: RAB27A RAD21 STAT1 NA
df1 <- structure(list(V1 = c("RAB27A", "RAB27A", "ITGA4", "PANK3", "PANK3",
"PANK3"), V2 = c("RAD21", "STAT1", "RAD21", "SIX5", "SREBF1",
"USF1")), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, -6L))