R:stats包中的reshape()函数有问题

时间:2014-11-01 19:55:46

标签: r reshape

当data.frame中有多个需要融化的变量时,我对如何使其工作感到困惑。这是一个例子:

Data <- data.frame(SampleID = rep(1:10, each = 3), 
               TimePoint = rep(LETTERS[1:3], 10))
Data$File.ESIpos <- paste("20141031 Subject", Data$SampleID, "Point",
                     Data$TimePoint, "ESIpos")

Data$Date.ESIpos <- "20141031"

Data$File.ESIneg <- paste("20141030 Subject", Data$SampleID, "Point", 
                     Data$TimePoint, "ESIneg")
Data$Date.ESIneg <- "20141030"

Data$File.APCIpos <- paste("20141029 Subject", Data$SampleID, "Point", 
                     Data$TimePoint, "APCIpos")
Data$Date.APCIpos <- "20141029"

我希望Date和File都能融化,以便新的data.frame有“SampleID”,“TimePoint”列,新列“Mode”(其中选项是ESIpos,ESIneg和APCIpos) ,“文件”和“日期”。这是我与reshape()函数最接近的。

Data.long <- reshape(Data, 
                     varying = c("File.ESIpos", "Date.ESIpos",
                                 "File.ESIneg", "Date.ESIneg", 
                                 "File.APCIpos", "Date.APCIpos"),
                     idvar = c("SampleID", "TimePoint"),
                     ids = c("ESIpos", "ESIneg", "APCIpos"),
                     v.names = c("Date", "File"),
                     sep = ".",
                     direction = "long")

输出是一个data.frame,其列为“SampleID”,“TimePoint”,“time”(对于“ESIpos”,“ESIneg”或“”为“1”,“2”或“3” APCIpos“),”日期“和”文件“。

第一个问题是我没有看到如何定义新的“模式”列。当然,我可以将“时间”列更改为“模式”,但是没有办法告诉它级别应该是“ESIpos”,“ESIneg”和“APCIpos”而不是1,2 ,3?我以为我是用ids = c("ESIpos"...这样做的,但显然我不是。另外,无论是否包含ids = c("ESIpos"...行,我都会获得相同的输出。

第二个较小的问题是无论我说是v.names = c("Date", "File")还是v.names = c("File", "Date"),都会交换列,即我在Date列中获取文件名,反之亦然。

4 个答案:

答案 0 :(得分:3)

我认为这是你所追求的reshape()命令

reshaped <- reshape(Data, direction = "long", varying = 3:8, 
                 times = c("ESIpos", "ESIneg", "ACPIpos"))
head(reshaped)
#          SampleID TimePoint   time                              File     Date id
# 1.ESIpos        1         A ESIpos 20141031 Subject 1 Point A ESIpos 20141031  1
# 2.ESIpos        1         B ESIpos 20141031 Subject 1 Point B ESIpos 20141031  2
# 3.ESIpos        1         C ESIpos 20141031 Subject 1 Point C ESIpos 20141031  3
# 4.ESIpos        2         A ESIpos 20141031 Subject 2 Point A ESIpos 20141031  4
# 5.ESIpos        2         B ESIpos 20141031 Subject 2 Point B ESIpos 20141031  5
# 6.ESIpos        2         C ESIpos 20141031 Subject 2 Point C ESIpos 20141031  6

答案 1 :(得分:2)

由于偏头痛,我总是放弃reshape,但是当有人使用偏头痛时我总是感到惊讶并且它有效,所以我希望看到使用它的解决方案。所以,你可以使用reshape2::melt两次并结合结果:

library(reshape2)
vars <- c('SampleID','TimePoint','Mode')
m1 <- melt(Data, id.vars = c(vars[1:2], names(Data)[grep('File', names(Data))]),
           variable.name = 'Mode', value.name = 'Date')[c(vars, 'Date')]
m2 <- melt(Data, id.vars = c(vars[1:2], names(Data)[grep('Date', names(Data))]),
           variable.name = 'Mode', value.name = 'File')[c(vars, 'File')]

m1$Mode <- gsub('Date.', '', m1$Mode)
m2$Mode <- gsub('File.', '', m2$Mode)

identical(m1[1:3], m2[1:3])
# [1] TRUE

Data.long <- cbind(m1, m2['File'])

head(Data.long[with(Data.long, order(SampleID, TimePoint)), ])

#    SampleID TimePoint    Mode     Date                               File
# 1         1         A  ESIpos 20141031  20141031 Subject 1 Point A ESIpos
# 31        1         A  ESIneg 20141030  20141030 Subject 1 Point A ESIneg
# 61        1         A APCIpos 20141029 20141029 Subject 1 Point A APCIpos
# 2         1         B  ESIpos 20141031  20141031 Subject 1 Point B ESIpos
# 32        1         B  ESIneg 20141030  20141030 Subject 1 Point B ESIneg
# 62        1         B APCIpos 20141029 20141029 Subject 1 Point B APCIpos

或者使用stats::reshape

执行类似的操作

答案 2 :(得分:2)

以下是我如何解决tidyr的问题:

library(tidyr)

Data %>%
  # Gather all columns except SampleID and TimePoint 
  # (since they're already variables)
  gather(key, value, -SampleID, -TimePoint) %>% 
  # Separate the key into components type and mode
  separate(key, c("type", "mode"), "\\.") %>%
  # Spread the type back into the columns
  spread(type, value)
#>    SampleID TimePoint    mode     Date                                File
#> 1         1         A APCIpos 20141029  20141029 Subject 1 Point A APCIpos
#> 2         1         A  ESIneg 20141030   20141030 Subject 1 Point A ESIneg
#> 3         1         A  ESIpos 20141031   20141031 Subject 1 Point A ESIpos
#> 4         1         B APCIpos 20141029  20141029 Subject 1 Point B APCIpos
#> 5         1         B  ESIneg 20141030   20141030 Subject 1 Point B ESIneg
#> 6         1         B  ESIpos 20141031   20141031 Subject 1 Point B ESIpos
#> 7         1         C APCIpos 20141029  20141029 Subject 1 Point C APCIpos
#...

为了弄清楚如何自己设定这些步骤,我建议您阅读Tidy Data,其中列出了一个可以帮助您更好地理解问题的框架。

答案 3 :(得分:0)

melt.data.table中的

v1.9.5现在可以融入多个列。有了这个,我们可以做到:

require(data.table) ## v1.9.5
ans = melt(setDT(Data), id=c("SampleID", "TimePoint"), 
      measure=list(c(3,5,7), c(4,6,8)), value.name=c("File", "Date"))
setattr(ans$variable, 'levels', 
        unique(gsub(".*[.]", "", names(Data)[-(1:2)])))
#   SampleID TimePoint variable                                File     Date
# 1:        1         A   ESIpos   20141031 Subject 1 Point A ESIpos 20141031
# 2:        1         B   ESIpos   20141031 Subject 1 Point B ESIpos 20141031
# 3:        1         C   ESIpos   20141031 Subject 1 Point C ESIpos 20141031
# 4:        2         A   ESIpos   20141031 Subject 2 Point A ESIpos 20141031
# 5:        2         B   ESIpos   20141031 Subject 2 Point B ESIpos 20141031
# 6:        2         C   ESIpos   20141031 Subject 2 Point C ESIpos 20141031
# ...

您可以从here获取开发版本。