当data.frame中有多个需要融化的变量时,我对如何使其工作感到困惑。这是一个例子:
Data <- data.frame(SampleID = rep(1:10, each = 3),
TimePoint = rep(LETTERS[1:3], 10))
Data$File.ESIpos <- paste("20141031 Subject", Data$SampleID, "Point",
Data$TimePoint, "ESIpos")
Data$Date.ESIpos <- "20141031"
Data$File.ESIneg <- paste("20141030 Subject", Data$SampleID, "Point",
Data$TimePoint, "ESIneg")
Data$Date.ESIneg <- "20141030"
Data$File.APCIpos <- paste("20141029 Subject", Data$SampleID, "Point",
Data$TimePoint, "APCIpos")
Data$Date.APCIpos <- "20141029"
我希望Date和File都能融化,以便新的data.frame有“SampleID”,“TimePoint”列,新列“Mode”(其中选项是ESIpos,ESIneg和APCIpos) ,“文件”和“日期”。这是我与reshape()函数最接近的。
Data.long <- reshape(Data,
varying = c("File.ESIpos", "Date.ESIpos",
"File.ESIneg", "Date.ESIneg",
"File.APCIpos", "Date.APCIpos"),
idvar = c("SampleID", "TimePoint"),
ids = c("ESIpos", "ESIneg", "APCIpos"),
v.names = c("Date", "File"),
sep = ".",
direction = "long")
输出是一个data.frame,其列为“SampleID”,“TimePoint”,“time”(对于“ESIpos”,“ESIneg”或“”为“1”,“2”或“3” APCIpos“),”日期“和”文件“。
第一个问题是我没有看到如何定义新的“模式”列。当然,我可以将“时间”列更改为“模式”,但是没有办法告诉它级别应该是“ESIpos”,“ESIneg”和“APCIpos”而不是1,2 ,3?我以为我是用ids = c("ESIpos"...
这样做的,但显然我不是。另外,无论是否包含ids = c("ESIpos"...
行,我都会获得相同的输出。
第二个较小的问题是无论我说是v.names = c("Date", "File")
还是v.names = c("File", "Date")
,都会交换列,即我在Date列中获取文件名,反之亦然。
答案 0 :(得分:3)
我认为这是你所追求的reshape()
命令
reshaped <- reshape(Data, direction = "long", varying = 3:8,
times = c("ESIpos", "ESIneg", "ACPIpos"))
head(reshaped)
# SampleID TimePoint time File Date id
# 1.ESIpos 1 A ESIpos 20141031 Subject 1 Point A ESIpos 20141031 1
# 2.ESIpos 1 B ESIpos 20141031 Subject 1 Point B ESIpos 20141031 2
# 3.ESIpos 1 C ESIpos 20141031 Subject 1 Point C ESIpos 20141031 3
# 4.ESIpos 2 A ESIpos 20141031 Subject 2 Point A ESIpos 20141031 4
# 5.ESIpos 2 B ESIpos 20141031 Subject 2 Point B ESIpos 20141031 5
# 6.ESIpos 2 C ESIpos 20141031 Subject 2 Point C ESIpos 20141031 6
答案 1 :(得分:2)
由于偏头痛,我总是放弃reshape
,但是当有人使用偏头痛时我总是感到惊讶并且它有效,所以我希望看到使用它的解决方案。所以,你可以使用reshape2::melt
两次并结合结果:
library(reshape2)
vars <- c('SampleID','TimePoint','Mode')
m1 <- melt(Data, id.vars = c(vars[1:2], names(Data)[grep('File', names(Data))]),
variable.name = 'Mode', value.name = 'Date')[c(vars, 'Date')]
m2 <- melt(Data, id.vars = c(vars[1:2], names(Data)[grep('Date', names(Data))]),
variable.name = 'Mode', value.name = 'File')[c(vars, 'File')]
m1$Mode <- gsub('Date.', '', m1$Mode)
m2$Mode <- gsub('File.', '', m2$Mode)
identical(m1[1:3], m2[1:3])
# [1] TRUE
Data.long <- cbind(m1, m2['File'])
head(Data.long[with(Data.long, order(SampleID, TimePoint)), ])
# SampleID TimePoint Mode Date File
# 1 1 A ESIpos 20141031 20141031 Subject 1 Point A ESIpos
# 31 1 A ESIneg 20141030 20141030 Subject 1 Point A ESIneg
# 61 1 A APCIpos 20141029 20141029 Subject 1 Point A APCIpos
# 2 1 B ESIpos 20141031 20141031 Subject 1 Point B ESIpos
# 32 1 B ESIneg 20141030 20141030 Subject 1 Point B ESIneg
# 62 1 B APCIpos 20141029 20141029 Subject 1 Point B APCIpos
或者使用stats::reshape
答案 2 :(得分:2)
以下是我如何解决tidyr的问题:
library(tidyr)
Data %>%
# Gather all columns except SampleID and TimePoint
# (since they're already variables)
gather(key, value, -SampleID, -TimePoint) %>%
# Separate the key into components type and mode
separate(key, c("type", "mode"), "\\.") %>%
# Spread the type back into the columns
spread(type, value)
#> SampleID TimePoint mode Date File
#> 1 1 A APCIpos 20141029 20141029 Subject 1 Point A APCIpos
#> 2 1 A ESIneg 20141030 20141030 Subject 1 Point A ESIneg
#> 3 1 A ESIpos 20141031 20141031 Subject 1 Point A ESIpos
#> 4 1 B APCIpos 20141029 20141029 Subject 1 Point B APCIpos
#> 5 1 B ESIneg 20141030 20141030 Subject 1 Point B ESIneg
#> 6 1 B ESIpos 20141031 20141031 Subject 1 Point B ESIpos
#> 7 1 C APCIpos 20141029 20141029 Subject 1 Point C APCIpos
#...
为了弄清楚如何自己设定这些步骤,我建议您阅读Tidy Data,其中列出了一个可以帮助您更好地理解问题的框架。
答案 3 :(得分:0)
melt.data.table
中的 v1.9.5
现在可以融入多个列。有了这个,我们可以做到:
require(data.table) ## v1.9.5
ans = melt(setDT(Data), id=c("SampleID", "TimePoint"),
measure=list(c(3,5,7), c(4,6,8)), value.name=c("File", "Date"))
setattr(ans$variable, 'levels',
unique(gsub(".*[.]", "", names(Data)[-(1:2)])))
# SampleID TimePoint variable File Date
# 1: 1 A ESIpos 20141031 Subject 1 Point A ESIpos 20141031
# 2: 1 B ESIpos 20141031 Subject 1 Point B ESIpos 20141031
# 3: 1 C ESIpos 20141031 Subject 1 Point C ESIpos 20141031
# 4: 2 A ESIpos 20141031 Subject 2 Point A ESIpos 20141031
# 5: 2 B ESIpos 20141031 Subject 2 Point B ESIpos 20141031
# 6: 2 C ESIpos 20141031 Subject 2 Point C ESIpos 20141031
# ...
您可以从here获取开发版本。