假设我有一个带有rownames的数据帧作为分子和分母的连接名称。该数据帧的两列是该数据帧的条目,是分子和分母。
up<-c("up1","up2","up3","up4")
down<-c("down1","down2","down3")
singleRatio<-as.data.frame(expand.grid(up,down))
rownames(singleRatio)<-paste(singleRatio$Var1,singleRatio$Var2,sep="_")
colnames(singleRatio)<-c("numerator","denominator")
每个分子和分母在数据框中具有相应的条目,其中num / denom为行,样本为列。
sample1<-c(1,2,3,4,5,1,2)
sample2<-c(5,4,5,7,2,2,3)
sample3<-c(2,3,6,5,3,2,3)
sample4<-c(5,5,5,8,1,2,3)
data<-data.frame(sample1,sample2,sample3,sample4)
rownames(data)<-c(up,down)
我想创建一个充满测试结果的数据框,我计算所有比率并将它们与阈值进行比较(如果超过阈值则为1,如果低于阈值则为0)。这会创建ratios.df:
ratios.df<-data.frame(matrix(nrow = length(rownames(singleRatio)),ncol = length(colnames(data)) ))
rownames(ratios.df)<-rownames(singleRatio)
colnames(ratios.df)<-colnames(data)
ratios.df
我有一个名为getRatio的函数来查找每个样本的所有比率:
getRatio<-function(sampleData){
sampleRatios<-rep(0,each=length(rownames(singleRatio)))
names(sampleRatios)<-rownames(singleRatio)
for( ratio in rownames(singleRatio)){
sampleRatios[ratio]<-sampleData[singleRatio[ratio,1]]/(sampleData[singleRatio[ratio,1]] + sampleData[singleRatio[ratio,2]])
}
return(sampleRatios)
}
这是我试图把所有东西放在一起的尝试。
thresholds<-c(0.1,0.5,0.1,0.5,0.1,0.5,0.1,0.5,0.1,0.5,0.1,0.5)
for (sampleName in colnames(data)){
dataline<-data[,sampleName]
names(dataline)<-rownames(data)
sampleRatios<-getRatio(dataline)
ratios.df[,sampleName]<-sampleRatios
#ratios.df[,sampleName]<-ifelse(sampleRatios > thresholds,1,0)
}
问题在于,当我查看得到的比率时,没有任何匹配。 ratio.df最终成为:
> ratios.df
sample1 sample2 sample3 sample4
up1_down1 0.5000000 0.5000000 0.5000000 0.5000000
up2_down1 0.6666667 0.4444444 0.6000000 0.5000000
up3_down1 0.7500000 0.5000000 0.7500000 0.5000000
up4_down1 0.8000000 0.5833333 0.7142857 0.6153846
up1_down2 0.3333333 0.5555556 0.4000000 0.5000000
up2_down2 0.5000000 0.5000000 0.5000000 0.5000000
up3_down2 0.6000000 0.5555556 0.6666667 0.5000000
up4_down2 0.6666667 0.6363636 0.6250000 0.6153846
up1_down3 0.2500000 0.5000000 0.2500000 0.5000000
up2_down3 0.4000000 0.4444444 0.3333333 0.5000000
up3_down3 0.5000000 0.5000000 0.5000000 0.5000000
up4_down3 0.5714286 0.5833333 0.4545455 0.6153846
原始数据是
> data
sample1 sample2 sample3 sample4
up1 1 5 2 5
up2 2 4 3 5
up3 3 5 6 5
up4 4 7 5 8
down1 5 2 3 1
down2 1 2 2 2
down3 2 3 3 3
表示sample1的up1_down1的比率应为1 /(1 + 5)= 0.33,而不是0.50。长话短说,我不知道为什么甚至在这里交换东西的地方。任何人都能看到我做错了什么?
答案 0 :(得分:1)
问题是singleRatio
data.frame
包含两列因子而不是字符,因此当您执行此类选择时:sampleData[singleRatio[ratio,1]]
该因子被强制转换为整数而不是获取其字符串表示因此选择了错误的值。
如果你创建一个data.frame字符(更改以下行),一切都应该有效:
singleRatio<-as.data.frame(expand.grid(up,down,stringsAsFactors=FALSE))
结果:
> ratios.df
sample1 sample2 sample3 sample4
up1_down1 0.1666667 0.7142857 0.4000000 0.8333333
up2_down1 0.2857143 0.6666667 0.5000000 0.8333333
up3_down1 0.3750000 0.7142857 0.6666667 0.8333333
up4_down1 0.4444444 0.7777778 0.6250000 0.8888889
up1_down2 0.5000000 0.7142857 0.5000000 0.7142857
up2_down2 0.6666667 0.6666667 0.6000000 0.7142857
up3_down2 0.7500000 0.7142857 0.7500000 0.7142857
up4_down2 0.8000000 0.7777778 0.7142857 0.8000000
up1_down3 0.3333333 0.6250000 0.4000000 0.6250000
up2_down3 0.5000000 0.5714286 0.5000000 0.6250000
up3_down3 0.6000000 0.6250000 0.6666667 0.6250000
up4_down3 0.6666667 0.7000000 0.6250000 0.7272727