我的数据采用这种格式(longer, but still abbreviated, dataset can be found here):
pull_req_id user action created_at
12359 arthurnn opened 1380126837
12359 rafaelfranca discussed 1380127245
12359 arthurnn discussed 1380127676
12357 JuanitoFatas opened 1380122817
12357 JuanitoFatas opened 1380122822
12357 senny reviewed 1380171899
...
现在,我想重新安排此数据框,使其如下所示:
12359, opened, discussed, discussed
12357, opened, opened, reviewed
意味着具有相同“pull_req_id”的所有行应该在一行上,并且该行应该基本上是一个向量,以“pull_req_id”开头,后跟“action”中的字符串,由整数排序“created_at”(自纪元以来的秒数,早先应该是第一个(向左),然后应该跟随(向右)。
我如何在R?
中完成此任务答案 0 :(得分:4)
# read data
dat <- read.csv("http://pastebin.com/raw.php?i=VqgaLWqZ")
# create vector with 'action' values
vec <- with(dat, tapply(action, pull_req_id, FUN = paste, collapse = ", "))
# add 'pull_req_id' values
vec2 <- paste(names(vec), vec, sep = ", ")
# create one-column data frame
dat2 <- data.frame(vec2)
head(dat2, 3)
# vec2
# 1 12146, discussed
# 2 12147, opened, discussed, closed, discussed
# 3 12148, merged, opened, closed
您还可以将数据写入csv
文件:
write.table(dat2, "filename.csv",
row.names = FALSE, col.names = FALSE, quote = FALSE)
生成的文件的前四行:
12146, discussed
12147, opened, discussed, closed, discussed
12148, merged, opened, closed
12149, discussed, referenced, referenced, closed
答案 1 :(得分:3)
与@SvenHohenstein相似,但似乎更简单。
df <-read.csv(header=T, file="http://pastebin.com/download.php?i=VqgaLWqZ",stringsAsFactors=F)
df <- df[order(df$pull_req_id,df$created_at),] # correct order of actions
z <- aggregate(action~pull_req_id,df,function(x){paste(x,collapse=",")})
write.csv(z,"outfile.csv", quote=F)
head(z,4)
# pull_req_id action
# 1 12146 discussed
# 2 12147 opened,discussed,discussed,closed
# 3 12148 opened,merged,closed
# 4 12149 referenced,closed,referenced,discussed
空间并不存在:它只是R排列的方式。查看csv文件。
答案 2 :(得分:0)
这是尝试完成类似的事情:
dat <- read.csv("http://pastebin.com/raw.php?i=VqgaLWqZ")
data.split <- split(data$action, data$pull_req_id)
data.split
list.to.df <- function(arg.list) {
max.len <- max(sapply(arg.list, length))
arg.list <- lapply(arg.list, `length<-`, max.len)
as.data.frame(arg.list)
}
df.out <- list.to.df(data.split)