我有一些看起来有点像这样的数据:
test.frame <- read.table(text = "name amounts
JEAN 318.5,45
GREGORY 1518.5,67,8
WALTER 518.5
LARRY 518.5,55,1
HARRY 318.5,32
",header = TRUE,sep = "")
我希望它看起来更像......
name amount
JEAN 318.5
JEAN 45
GREGORY 1518.5
GREGORY 67
GREGORY 8
WALTER 518.5
LARRY 518.5
LARRY 55
LARRY 1
HARRY 318.5
HARRY 32
似乎应该有一种直截了当的方式来突破&#34;数量&#34;专栏,但我没想出来。很高兴为这个特定命令采取&#34; RTFM页面&#34;回答。我正在寻找的命令是什么?
答案 0 :(得分:5)
(test.frame <- read.table(text = "name amounts
JEAN 318.5,45
GREGORY 1518.5,67,8
WALTER 518.5
LARRY 518.5,55,1
HARRY 318.5,32
",header = TRUE,sep = ""))
# name amounts
# 1 JEAN 318.5,45
# 2 GREGORY 1518.5,67,8
# 3 WALTER 518.5
# 4 LARRY 518.5,55,1
# 5 HARRY 318.5,32
tmp <- setNames(strsplit(as.character(test.frame$amounts),
split = ','), test.frame$name)
data.frame(name = rep(names(tmp), sapply(tmp, length)),
amounts = unlist(tmp), row.names = NULL)
# name amounts
# 1 JEAN 318.5
# 2 JEAN 45
# 3 GREGORY 1518.5
# 4 GREGORY 67
# 5 GREGORY 8
# 6 WALTER 518.5
# 7 LARRY 518.5
# 8 LARRY 55
# 9 LARRY 1
# 10 HARRY 318.5
# 11 HARRY 32
答案 1 :(得分:5)
最快的方式(可能)是data.table
library(data.table)
setDT(test.frame)[, lapply(.SD, function(x) unlist(strsplit(as.character(x), ','))),
.SDcols = "amounts", by = name]
## name amounts
## 1: JEAN 318.5
## 2: JEAN 45
## 3: GREGORY 1518.5
## 4: GREGORY 67
## 5: GREGORY 8
## 6: WALTER 518.5
## 7: LARRY 518.5
## 8: LARRY 55
## 9: LARRY 1
## 10: HARRY 318.5
## 11: HARRY 32
答案 2 :(得分:4)
David Arenburg解决方案的概括将是使用我的cSplit
函数。从Git Hub Gist(https://gist.github.com/mrdwab/11380733)获取它,或者使用&#34; devtools&#34;
# library(devtools)
# source_gist(11380733)
&#34; long&#34;格式将是你要找的......
cSplit(test.frame, "amounts", ",", "long")
# name amounts
# 1: JEAN 318.5
# 2: JEAN 45
# 3: GREGORY 1518.5
# 4: GREGORY 67
# 5: GREGORY 8
# 6: WALTER 518.5
# 7: LARRY 518.5
# 8: LARRY 55
# 9: LARRY 1
# 10: HARRY 318.5
# 11: HARRY 32
但该功能也可以创建宽输出格式:
cSplit(test.frame, "amounts", ",", "wide")
# name amounts_1 amounts_2 amounts_3
# 1: JEAN 318.5 45 NA
# 2: GREGORY 1518.5 67 8
# 3: WALTER 518.5 NA NA
# 4: LARRY 518.5 55 1
# 5: HARRY 318.5 32 NA
此功能的一个优点是可以一次拆分多个列。
答案 3 :(得分:1)
这不是一种超标准格式,但这是一种可以转换数据的方法。首先,我会将stringsAsFactors=F
与read.table
一起使用,以确保所有内容都是字符变量而不是因素。或者,您可以对这些列进行as.character()
。
首先,我使用逗号分割金额中的值,然后将值与名称列
组合md <- do.call(rbind, Map(cbind, test.frame$name,
strsplit(test.frame$amounts, ",")))
然后我将所有内容粘贴在一起并发送到read.table
进行变量转换
read.table(text=apply(md,1,paste, collapse="\t"),
sep="\t", col.names=names(test.frame))
或者你可以从md
矩阵创建一个data.frame并自己进行类转换
data.frame(names=md[,1], amount=as.numeric(md[,2]))
答案 4 :(得分:1)
以下是plyr
解决方案:
Split.Amounts <- function(x) {
amounts <- unlist(strsplit(as.character(x$amounts), ","))
return(data.frame(name = x$name, amounts = amounts, stringsAsFactors=FALSE))
}
library(plyr)
ddply(test.frame, .(name), Split.Amounts)
使用dplyr
:
library(dplyr)
test.frame %>%
group_by(name) %>%
do(Split.Amounts(.))