我有一个带有一些布尔值(1/0)的数据帧如下(抱歉,我无法弄清楚如何将它变成智能表)
Flag1.Sam Flag2.Sam Flag3.Sam Flag1.Ted Flag2.Ted Flag3.Ted
probe1 0 1 0 1 0 0
probe2 0 0 0 0 0 0
probe3 1 0 0 0 0 0
probe4 0 0 0 0 0 0
probe5 1 1 0 1 0 0
我有64个样本(Sam / Ted ....等),这些样本位于名为文件的列表中,即
files <- c("Sam", "Ted", "Ann", ....)
我想创建一个列,汇总每个样本的标志值,以创建以下内容:
Sam Ted
probe1.flagsum 1 1
probe2.flagsum 0 0
probe3.flagsum 1 0
probe4.flagsum 0 0
probe5.flagsum 2 1
我是R的新手,试图了解需要知道的基础,但我尝试了以下内容:
for(i in files) {
FLAGS$i <- cbind(sapply(i, function(y) {
#greping columns to filter for one sample
filter1 <- grep(names(filters), pattern=y)
#print out the summed values for those columns
FLAGS$y <-rowSums(filters[,(filter1)])
}
}
上面的代码不起作用,我对如何继续前进有点迷茫。
任何人都可以帮我解决这个问题,或者指出我要使用的命令/工具的正确方向。
谢谢。
答案 0 :(得分:1)
如果filters
是您的输入矩阵而FLAGS
是您想要的输出矩阵,那么我会(天真地)做这样的事情:
FLAGS <- matrix(0,nrow=nrow(filters),ncol=length(files))
for(i in 1:length(files)){
grep(files[i],colnames(filters)) -> index
FLAGS[,i] <- rowSums(filters[,index])
}
colnames(FLAGS) <- files
答案 1 :(得分:1)
假设您的矩阵称为输入
input <- matrix(rbinom(30, 1, 0.5), ncol = 6)
colnames(input) <- c("F1.S", "F2.S", "F3.S", "F1.T", "F2.T", "F3.T")
rownames(input) <- paste("probe", 1:5, sep = "")
input <- as.data.frame(input)
library(reshape)
input$probe <- rownames(input)
Molten <- melt(input, id.vars = "probe")
Molten$ID <- gsub("^.*\\.", "", levels(Molten$variable))[Molten$variable]
cast(probe ~ ID, data = Molten, fun = "sum")
使用mrdwab中的dat帧进行更新
dat = read.table(header=TRUE, text="Flag1.Sam Flag2.Sam Flag3.Sam Flag1.Ted Flag2.Ted Flag3.Ted
probe1 0 1 0 1 0 0
probe2 0 0 0 0 0 0
probe3 1 0 0 0 0 0
probe4 0 0 0 0 0 0
probe5 1 1 0 1 0 0")
library(reshape)
dat$probe <- rownames(dat)
Molten <- melt(dat, id.vars = "probe")
Molten$ID <- gsub("^.*\\.", "", levels(Molten$variable))[Molten$variable]
cast(probe ~ ID, data = Molten, fun = "sum")
答案 2 :(得分:1)
这在基础R reshape
中很容易实现,但使用reshape
或reshape2
包可能更直观。
以下是基础R的解决方案:
# Here's your data in its current form
dat = read.table(header=TRUE, text="Flag1.Sam Flag2.Sam Flag3.Sam Flag1.Ted Flag2.Ted Flag3.Ted
probe1 0 1 0 1 0 0
probe2 0 0 0 0 0 0
probe3 1 0 0 0 0 0
probe4 0 0 0 0 0 0
probe5 1 1 0 1 0 0")
# Generate an ID row
dat$id = row.names(dat)
# Reshape wide to long
r.dat = reshape(dat, direction="long",
timevar="probe",
varying=1:6, sep=".")
# Calculate row sums
r.dat$sum = rowSums(r.dat[3:5])
# Reshape back to wide format, dropping what you're not interested in
reshape(r.dat, direction="wide",
idvar="id", timevar="probe",
drop=3:5)
## id sum.Sam sum.Ted
## probe1.Sam probe1 1 1
## probe2.Sam probe2 0 0
## probe3.Sam probe3 1 0
## probe4.Sam probe4 0 0
## probe5.Sam probe5 2 1
您还可以启动类似这样的功能:
myFun = function(data, varnames) {
temp = vector("list", length(varnames))
for (i in 1:length(varnames)) {
temp[[i]] = colSums(t(dat[grep(varnames[i], names(data))]))
names(temp)[[i]] = varnames[i]
}
data.frame(temp)
}
然后,使用你有名字的矢量:
files = c("Sam", "Ted")
myFun(dat, files)
## Sam Ted
## probe1 1 1
## probe2 0 0
## probe3 1 0
## probe4 0 0
## probe5 2 1
享受!