在R中的数据帧中对列表进行非规范化

时间:2016-05-10 06:28:28

标签: r

我有一个看起来像这样的数据框(来自mongo db ..)

team_id <- c(1,2)
member <- c("15,25,35","12,22,32")
data.frame (team_id,member)

我正试图像这样转换数据框..

team_id2 <- c(1,1,1,2,2,2)
member2 <- c(15,25,35,12,22,32)
data.frame (team_id2, member2)

我尝试使用'unlist',但不能让每个列都重复“team_id”。我将不胜感激任何指导!

3 个答案:

答案 0 :(得分:6)

这是基础R解决方案:

sapply(ms,length)

如果您使用的是最新版本的R,则可以将-1替换为lengths(ms)

答案 1 :(得分:4)

我们可以使用cSplit中的library(splitstackshape)。使用cSplit处理这类问题既简单又紧凑。我们只提供要拆分的列,即member,分隔符(,)和方向(long)。

library(splitstackshape)
cSplit(d1, "member", sep=",", "long")
#    team_id member
#1:       1     15
#2:       1     25
#3:       1     35
#4:       2     12
#5:       2     22
#6:       2     32

或使用data.table,我们会转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(d1)),按&#39; team_id&#39;分组,我们拆分了&#39;成员&#39;按,unlist输出。

library(data.table)
setDT(d1)[, .(member=unlist(tstrsplit(member, ","))), team_id]
#   team_id member
#1:       1     15
#2:       1     25
#3:       1     35
#4:       2     12
#5:       2     22
#6:       2     32

或者使用tidyr,我们可以拆分&#39;成员&#39; ,unnest(来自tidyr

library(tidyr)
library(stringr)
unnest(d1[1], member= str_split(d1$member, ","))
#Source: local data frame [6 x 2]

#  team_id member
#   (dbl)  (chr)
#1       1     15
#2       1     25
#3       1     35
#4       2     12
#5       2     22
#6       2     32

或者我们可以使用base R解决方案。我们使用strsplitsplit&#39;成员&#39;将列添加到list,将names的{​​{1}}设置为&#39; team_id&#39;并使用liststack转换为list

data.frame

数据

stack(setNames(strsplit(as.character(d1$member), ","), d1$team_id))[2:1]

答案 2 :(得分:2)

投入tidyr解决方案:

library(tidyr)
team_id <- c(1,2)
member <- c("15,25,35","12,22,32")
old <- data.frame (team_id, member, stringsAsFactors = FALSE)

## need to determine how many items there are at most in column 'member'
maxItems <- max(sapply(strsplit(old$member, ","), length))

old %>% separate(member, seq_len(maxItems), ",") %>% 
        gather(position, member, -team_id) 
#   team_id position member
# 1       1        1     15
# 2       2        1     12
# 3       1        2     25
# 4       2        2     22
# 5       1        3     35
# 6       2        3     32