我有一个看起来像这样的数据框(来自mongo db ..)
team_id <- c(1,2)
member <- c("15,25,35","12,22,32")
data.frame (team_id,member)
我正试图像这样转换数据框..
team_id2 <- c(1,1,1,2,2,2)
member2 <- c(15,25,35,12,22,32)
data.frame (team_id2, member2)
我尝试使用'unlist',但不能让每个列都重复“team_id”。我将不胜感激任何指导!
答案 0 :(得分:6)
答案 1 :(得分:4)
我们可以使用cSplit
中的library(splitstackshape)
。使用cSplit
处理这类问题既简单又紧凑。我们只提供要拆分的列,即member
,分隔符(,
)和方向(long
)。
library(splitstackshape)
cSplit(d1, "member", sep=",", "long")
# team_id member
#1: 1 15
#2: 1 25
#3: 1 35
#4: 2 12
#5: 2 22
#6: 2 32
或使用data.table
,我们会转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(d1)
),按&#39; team_id&#39;分组,我们拆分了&#39;成员&#39;按,
和unlist
输出。
library(data.table)
setDT(d1)[, .(member=unlist(tstrsplit(member, ","))), team_id]
# team_id member
#1: 1 15
#2: 1 25
#3: 1 35
#4: 2 12
#5: 2 22
#6: 2 32
或者使用tidyr
,我们可以拆分&#39;成员&#39; ,
和unnest
(来自tidyr
)
library(tidyr)
library(stringr)
unnest(d1[1], member= str_split(d1$member, ","))
#Source: local data frame [6 x 2]
# team_id member
# (dbl) (chr)
#1 1 15
#2 1 25
#3 1 35
#4 2 12
#5 2 22
#6 2 32
或者我们可以使用base R
解决方案。我们使用strsplit
到split
&#39;成员&#39;将列添加到list
,将names
的{{1}}设置为&#39; team_id&#39;并使用list
将stack
转换为list
data.frame
stack(setNames(strsplit(as.character(d1$member), ","), d1$team_id))[2:1]
答案 2 :(得分:2)
投入tidyr
解决方案:
library(tidyr)
team_id <- c(1,2)
member <- c("15,25,35","12,22,32")
old <- data.frame (team_id, member, stringsAsFactors = FALSE)
## need to determine how many items there are at most in column 'member'
maxItems <- max(sapply(strsplit(old$member, ","), length))
old %>% separate(member, seq_len(maxItems), ",") %>%
gather(position, member, -team_id)
# team_id position member
# 1 1 1 15
# 2 2 1 12
# 3 1 2 25
# 4 2 2 22
# 5 1 3 35
# 6 2 3 32