按组拆分列

时间:2014-06-16 17:47:44

标签: r split dataframe

我有一些看起来有点像这样的数据:

test.frame <- read.table(text = "name   amounts   
                                JEAN  318.5,45
                             GREGORY 1518.5,67,8
                              WALTER  518.5
                               LARRY  518.5,55,1
                               HARRY  318.5,32
                         ",header = TRUE,sep = "")

我希望它看起来更像......

name   amount
JEAN  318.5
JEAN 45
GREGORY 1518.5
GREGORY 67
GREGORY 8
WALTER  518.5
LARRY  518.5
LARRY  55
LARRY  1
HARRY  318.5
HARRY  32

似乎应该有一种直截了当的方式来突破&#34;数量&#34;专栏,但我没想出来。很高兴为这个特定命令采取&#34; RTFM页面&#34;回答。我正在寻找的命令是什么?

5 个答案:

答案 0 :(得分:5)

(test.frame <- read.table(text = "name   amounts   
                                JEAN  318.5,45
                             GREGORY 1518.5,67,8
                              WALTER  518.5
                               LARRY  518.5,55,1
                               HARRY  318.5,32
                         ",header = TRUE,sep = ""))


#      name     amounts
# 1    JEAN    318.5,45
# 2 GREGORY 1518.5,67,8
# 3  WALTER       518.5
# 4   LARRY  518.5,55,1
# 5   HARRY    318.5,32

tmp <- setNames(strsplit(as.character(test.frame$amounts), 
                split = ','), test.frame$name)

data.frame(name = rep(names(tmp), sapply(tmp, length)), 
           amounts = unlist(tmp), row.names = NULL)

#       name amounts
# 1     JEAN   318.5
# 2     JEAN      45
# 3  GREGORY  1518.5
# 4  GREGORY      67
# 5  GREGORY       8
# 6   WALTER   518.5
# 7    LARRY   518.5
# 8    LARRY      55
# 9    LARRY       1
# 10   HARRY   318.5
# 11   HARRY      32

答案 1 :(得分:5)

最快的方式(可能)是data.table

library(data.table)
setDT(test.frame)[, lapply(.SD, function(x) unlist(strsplit(as.character(x), ','))),
                  .SDcols = "amounts", by = name]

 ##       name amounts
 ## 1:    JEAN   318.5
 ## 2:    JEAN      45
 ## 3: GREGORY  1518.5
 ## 4: GREGORY      67
 ## 5: GREGORY       8
 ## 6:  WALTER   518.5
 ## 7:   LARRY   518.5
 ## 8:   LARRY      55
 ## 9:   LARRY       1
 ## 10:  HARRY   318.5
 ## 11:  HARRY      32

答案 2 :(得分:4)

David Arenburg解决方案的概括将是使用我的cSplit函数。从Git Hub Gist(https://gist.github.com/mrdwab/11380733)获取它,或者使用&#34; devtools&#34;

加载它
# library(devtools)
# source_gist(11380733)

&#34; long&#34;格式将是你要找的......

cSplit(test.frame, "amounts", ",", "long")
#        name amounts
#  1:    JEAN   318.5
#  2:    JEAN      45
#  3: GREGORY  1518.5
#  4: GREGORY      67
#  5: GREGORY       8
#  6:  WALTER   518.5
#  7:   LARRY   518.5
#  8:   LARRY      55
#  9:   LARRY       1
# 10:   HARRY   318.5
# 11:   HARRY      32

但该功能也可以创建宽输出格式:

cSplit(test.frame, "amounts", ",", "wide")
#       name amounts_1 amounts_2 amounts_3
# 1:    JEAN     318.5        45        NA
# 2: GREGORY    1518.5        67         8
# 3:  WALTER     518.5        NA        NA
# 4:   LARRY     518.5        55         1
# 5:   HARRY     318.5        32        NA

此功能的一个优点是可以一次拆分多个列。

答案 3 :(得分:1)

这不是一种超标准格式,但这是一种可以转换数据的方法。首先,我会将stringsAsFactors=Fread.table一起使用,以确保所有内容都是字符变量而不是因素。或者,您可以对这些列进行as.character()

首先,我使用逗号分割金额中的值,然后将值与名称列

组合
md <- do.call(rbind, Map(cbind, test.frame$name, 
    strsplit(test.frame$amounts, ",")))

然后我将所有内容粘贴在一起并发送到read.table进行变量转换

read.table(text=apply(md,1,paste, collapse="\t"), 
    sep="\t", col.names=names(test.frame))

或者你可以从md矩阵创建一个data.frame并自己进行类转换

data.frame(names=md[,1], amount=as.numeric(md[,2]))

答案 4 :(得分:1)

以下是plyr解决方案:

Split.Amounts <- function(x) {
  amounts <- unlist(strsplit(as.character(x$amounts), ","))
  return(data.frame(name = x$name, amounts = amounts, stringsAsFactors=FALSE))
}

library(plyr)

ddply(test.frame, .(name), Split.Amounts)

使用dplyr

library(dplyr)

test.frame %>%
  group_by(name) %>%
  do(Split.Amounts(.))