我的CSV格式与此类似:
Section | ID | Totaltime | Item1/Word | Item1/Cat | Item1/Time...Item235/Time
我想重塑一下这样,以便在一行中每个ID不是所有235个条目,每个项目都有一行,按ID排序/分块,所以它看起来与此类似 -
Section | ID0 | Totaltime | Item1/Word | Item1/Cat | Item1/Time
Item2/Word | Item2/Cat | Item2/Time
Item3/Word | Item3/Cat | Item3/Time
...Item235/Word | Item235/Cat | Item235/Time
Section | ID1 | Totaltime | Item1/Word | Item1/Cat | Item1/Time...
我尝试使用ID作为vars.id参数将其融合,并将各种项目与grepl一起放入measures.vars参数中,但这会产生类似这样的结果 -
Section | ID0 | Totaltime
Section | ID0 | Item1/Word
Section | ID0 | Item1/Cat
Section | ID0 | Item1/Time
...
Section | ID0 | Item235/Word
Section | ID0 | Item235/Cat
Section | ID0 | Item235/Time
我也试过重铸,但没有太多运气。
本周我是R的新手,所以我确定可能会有一些超级明显的东西,但是我已经碰到了这一点。
答案 0 :(得分:1)
melt
可以在多个列上运行。 (使用@ rawr的数据)
require(data.table) # v1.9.5+
vals = unique(gsub("Item[0-9]+/", "", tail(names(dd), -3L)))
melt(setDT(dd), id=1:3, measure=lapply(vals, grep, names(dd)), value.name=vals)
# Section ID0 Totaltime variable Word Cat Time
# 1: 1 10001 100 1 1/word 1/cat 1/time
# 2: 2 10002 200 1 1/word 1/cat 1/time
# 3: 3 10003 300 1 1/word 1/cat 1/time
# 4: 4 10004 400 1 1/word 1/cat 1/time
# 5: 5 10005 500 1 1/word 1/cat 1/time
# 6: 1 10001 100 2 2/word 2/cat 2/time
# 7: 2 10002 200 2 2/word 2/cat 2/time
# 8: 3 10003 300 2 2/word 2/cat 2/time
# 9: 4 10004 400 2 2/word 2/cat 2/time
# 10: 5 10005 500 2 2/word 2/cat 2/time
# 11: 1 10001 100 3 3/word 3/cat 3/time
# 12: 2 10002 200 3 3/word 3/cat 3/time
# 13: 3 10003 300 3 3/word 3/cat 3/time
# 14: 4 10004 400 3 3/word 3/cat 3/time
# 15: 5 10005 500 3 3/word 3/cat 3/time
答案 1 :(得分:0)
试试这个
library(reshape2)
library(plyr)
df.melt <- melt(df, id.vars=c("Section", "ID0", "Totaltime"), variable.name="item.type", value.name="item.value")
df.mutate <- mutate(df.melt, item.no=gsub("(Item[0-9]+).*", "\\1", item.type), item.type=gsub("Item[0-9]+/", "", item.type)
df.final <- ddply(df.mutate, .(Section, ID0, Totaltime, item.no), function(d) df.final <- ddply(df.mutate, .(Section, ID0, Totaltime, item.no), function(d) dcast(d, Section + ID0 + Totaltime ~ item.type, value.var="item.value", fun.aggregate=function(x) x[1]))
答案 2 :(得分:0)
我认为这会得到您需要的格式:
dd <- data.frame(Section = 1:5, ID0 = 10001:10005, Totaltime = 1:5 * 100,
'Item1/Word' = '1/word', 'Item1/Cat' = '1/cat',
'Item1/Time' = '1/time',
'Item2/Word' = '2/word', 'Item2/Cat' = '2/cat',
'Item2/Time' = '2/time',
'Item3/Word' = '3/word', 'Item3/Cat' = '3/cat',
'Item3/Time' = '3/time', stringsAsFactors = FALSE,
check.names = FALSE)
# Section ID0 Totaltime Item1/Word Item1/Cat Item1/Time Item2/Word Item2/Cat Item2/Time Item3/Word Item3/Cat Item3/Time
# 1 1 10001 100 1/word 1/cat 1/time 2/word 2/cat 2/time 3/word 3/cat 3/time
# 2 2 10002 200 1/word 1/cat 1/time 2/word 2/cat 2/time 3/word 3/cat 3/time
# 3 3 10003 300 1/word 1/cat 1/time 2/word 2/cat 2/time 3/word 3/cat 3/time
# 4 4 10004 400 1/word 1/cat 1/time 2/word 2/cat 2/time 3/word 3/cat 3/time
# 5 5 10005 500 1/word 1/cat 1/time 2/word 2/cat 2/time 3/word 3/cat 3/time
## define the varying columns:
keys <- paste0('Item', 1:3)
keys <- c('Word','Cat','Time')
l <- lapply(keys, function(x) grep(x, names(dd)))
rr <- reshape(dd, direction = 'long', varying = l)
rr <- rr[with(rr, order(Section, ID0, Totaltime)),
## `reshape` makes two extra variabes, time and id, we dont want
-which(names(rr) %in% c('id','time'))]
rr[, 1:3] <- lapply(rr[, 1:3], function(x) ifelse(duplicated(x), '', x))
`rownames<-`(rr, NULL)
# Section ID0 Totaltime Item1/Word Item1/Cat Item1/Time
# 1 1 10001 100 1/word 1/cat 1/time
# 2 2/word 2/cat 2/time
# 3 3/word 3/cat 3/time
# 4 2 10002 200 1/word 1/cat 1/time
# 5 2/word 2/cat 2/time
# 6 3/word 3/cat 3/time
# 7 3 10003 300 1/word 1/cat 1/time
# 8 2/word 2/cat 2/time
# 9 3/word 3/cat 3/time
# 10 4 10004 400 1/word 1/cat 1/time
# 11 2/word 2/cat 2/time
# 12 3/word 3/cat 3/time
# 13 5 10005 500 1/word 1/cat 1/time
# 14 2/word 2/cat 2/time
# 15 3/word 3/cat 3/time