以特定方式重塑R中的csv

时间:2015-04-07 17:58:08

标签: r csv casting melt

我的CSV格式与此类似:

Section | ID | Totaltime | Item1/Word | Item1/Cat | Item1/Time...Item235/Time  

我想重塑一下这样,以便在一行中每个ID不是所有235个条目,每个项目都有一行,按ID排序/分块,所以它看起来与此类似 -

Section | ID0 | Totaltime | Item1/Word | Item1/Cat | Item1/Time 
                            Item2/Word | Item2/Cat | Item2/Time
                            Item3/Word | Item3/Cat | Item3/Time
                           ...Item235/Word | Item235/Cat | Item235/Time
Section | ID1 | Totaltime | Item1/Word | Item1/Cat | Item1/Time...

我尝试使用ID作为vars.id参数将其融合,并将各种项目与grepl一起放入measures.vars参数中,但这会产生类似这样的结果 -

Section | ID0 | Totaltime
Section | ID0 | Item1/Word 
Section | ID0 | Item1/Cat 
Section | ID0 | Item1/Time 
             ...
Section | ID0 | Item235/Word 
Section | ID0 | Item235/Cat 
Section | ID0 | Item235/Time

我也试过重铸,但没有太多运气。

本周我是R的新手,所以我确定可能会有一些超级明显的东西,但是我已经碰到了这一点。

3 个答案:

答案 0 :(得分:1)

来自data.table v1.9.5+

melt可以在多个列上运行。 (使用@ rawr的数据)

require(data.table) # v1.9.5+
vals = unique(gsub("Item[0-9]+/", "", tail(names(dd), -3L)))
melt(setDT(dd), id=1:3, measure=lapply(vals, grep, names(dd)), value.name=vals)
#     Section   ID0 Totaltime variable   Word   Cat   Time
#  1:       1 10001       100        1 1/word 1/cat 1/time
#  2:       2 10002       200        1 1/word 1/cat 1/time
#  3:       3 10003       300        1 1/word 1/cat 1/time
#  4:       4 10004       400        1 1/word 1/cat 1/time
#  5:       5 10005       500        1 1/word 1/cat 1/time
#  6:       1 10001       100        2 2/word 2/cat 2/time
#  7:       2 10002       200        2 2/word 2/cat 2/time
#  8:       3 10003       300        2 2/word 2/cat 2/time
#  9:       4 10004       400        2 2/word 2/cat 2/time
# 10:       5 10005       500        2 2/word 2/cat 2/time
# 11:       1 10001       100        3 3/word 3/cat 3/time
# 12:       2 10002       200        3 3/word 3/cat 3/time
# 13:       3 10003       300        3 3/word 3/cat 3/time
# 14:       4 10004       400        3 3/word 3/cat 3/time
# 15:       5 10005       500        3 3/word 3/cat 3/time

答案 1 :(得分:0)

试试这个

library(reshape2)
library(plyr)
df.melt <- melt(df, id.vars=c("Section", "ID0", "Totaltime"), variable.name="item.type", value.name="item.value")
df.mutate <- mutate(df.melt, item.no=gsub("(Item[0-9]+).*", "\\1", item.type), item.type=gsub("Item[0-9]+/", "", item.type)
df.final <- ddply(df.mutate, .(Section, ID0, Totaltime, item.no), function(d) df.final <- ddply(df.mutate, .(Section, ID0, Totaltime, item.no), function(d) dcast(d, Section + ID0 + Totaltime ~ item.type, value.var="item.value", fun.aggregate=function(x) x[1]))

答案 2 :(得分:0)

我认为这会得到您需要的格式:

dd <- data.frame(Section = 1:5, ID0 = 10001:10005, Totaltime = 1:5 * 100,
                 'Item1/Word' = '1/word', 'Item1/Cat' = '1/cat',
                 'Item1/Time' = '1/time',
                 'Item2/Word' = '2/word', 'Item2/Cat' = '2/cat',
                 'Item2/Time' = '2/time',
                 'Item3/Word' = '3/word', 'Item3/Cat' = '3/cat',
                 'Item3/Time' = '3/time', stringsAsFactors = FALSE,
                 check.names = FALSE)


#   Section   ID0 Totaltime Item1/Word Item1/Cat Item1/Time Item2/Word Item2/Cat Item2/Time Item3/Word Item3/Cat Item3/Time
# 1       1 10001       100     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 2       2 10002       200     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 3       3 10003       300     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 4       4 10004       400     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 5       5 10005       500     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time

## define the varying columns:
keys <- paste0('Item', 1:3)
keys <- c('Word','Cat','Time')
l <- lapply(keys, function(x) grep(x, names(dd)))

rr <- reshape(dd, direction = 'long', varying = l)
rr <- rr[with(rr, order(Section, ID0, Totaltime)),
         ## `reshape` makes two extra variabes, time and id, we dont want
         -which(names(rr) %in% c('id','time'))]
rr[, 1:3] <- lapply(rr[, 1:3], function(x) ifelse(duplicated(x), '', x))
`rownames<-`(rr, NULL)

#    Section   ID0 Totaltime Item1/Word Item1/Cat Item1/Time
# 1        1 10001       100     1/word     1/cat     1/time
# 2                              2/word     2/cat     2/time
# 3                              3/word     3/cat     3/time
# 4        2 10002       200     1/word     1/cat     1/time
# 5                              2/word     2/cat     2/time
# 6                              3/word     3/cat     3/time
# 7        3 10003       300     1/word     1/cat     1/time
# 8                              2/word     2/cat     2/time
# 9                              3/word     3/cat     3/time
# 10       4 10004       400     1/word     1/cat     1/time
# 11                             2/word     2/cat     2/time
# 12                             3/word     3/cat     3/time
# 13       5 10005       500     1/word     1/cat     1/time
# 14                             2/word     2/cat     2/time
# 15                             3/word     3/cat     3/time