我希望使用每个数据子组唯一的计算值填充新的数据框列。这是我的确切代码:
df <- read.csv('data_30_Mar2015.csv')
df$dCT <- NA
#FUNCTION
calc_dCT <- function(sample, DF){
sample_df <- DF[ which(DF$Sample=='sample'),]
print (sample_df)
VIC <- sample_df[ which(sample_df$Reporter=='VIC'),]
FAM <- sample_df[ which(sample_df$Reporter=='FAM'),]
VIC_mean<-mean(VIC[,3])
FAM_mean<-mean(FAM[,3])
DCT <- FAM_mean - VIC_mean
for (i in 1:length(sample_df)){
sample_df[i,4] <- DCT
}
DF<-merge(DF, sample_df, all=TRUE)
}
#CALLS TO FUNCTION
calc_dCT('c48', df)
calc_dCT('m48', df)
calc_dCT('c72', df)
calc_dCT('m72', df)
print (df)
这是输出:
calc_dCT('c48', df)
[1] Sample Reporter CT dCT
<0 rows> (or 0-length row.names)
calc_dCT('m48', df)
[1] Sample Reporter CT dCT
<0 rows> (or 0-length row.names)
calc_dCT('c72', df)
[1] Sample Reporter CT dCT
<0 rows> (or 0-length row.names)
calc_dCT('m72', df)
[1] Sample Reporter CT dCT
<0 rows> (or 0-length row.names)
print (df)
Sample Reporter CT dCT
1 m48 VIC 27.50595 NA
2 m48 VIC 27.77835 NA
3 m48 VIC 27.62321 NA
4 m48 FAM 30.87295 NA
5 m48 FAM 30.87967 NA
6 m48 FAM 30.73427 NA
7 c48 VIC 26.56715 NA
8 c48 VIC 26.89787 NA
9 c48 VIC 26.82587 NA
10 c48 FAM 30.20642 NA
11 c48 FAM 30.43074 NA
12 c48 FAM 30.36933 NA
13 m72 VIC 29.61585 NA
14 m72 VIC 28.65742 NA
15 m72 VIC 29.40057 NA
16 m72 FAM 32.27304 NA
17 m72 FAM 32.38696 NA
18 m72 FAM 32.24386 NA
19 c72 VIC 28.22370 NA
20 c72 VIC 28.17342 NA
21 c72 VIC 28.49104 NA
22 c72 FAM 31.91751 NA
23 c72 FAM 31.67524 NA
24 c72 FAM 31.87287 NA
它似乎没有正确地对数据进行分类,我不确定为什么会这样。我试图填写专栏“dCT&#39;计算出的DCT值。
答案 0 :(得分:2)
以下是使用data.table
的可能解决方案(假设您没有dCT
列)
library(data.table)
setDT(df)[, dCT := mean(CT[Reporter=='FAM']) - mean(CT[Reporter=='VIC']), by = Sample][]
# Sample Reporter CT dCT
# 1: m48 VIC 27.50595 3.193127
# 2: m48 VIC 27.77835 3.193127
# 3: m48 VIC 27.62321 3.193127
# 4: m48 FAM 30.87295 3.193127
# 5: m48 FAM 30.87967 3.193127
# 6: m48 FAM 30.73427 3.193127
# 7: c48 VIC 26.56715 3.571867
# 8: c48 VIC 26.89787 3.571867
...
答案 1 :(得分:0)
显然可以在dplyr中完成同样的事情,所以我只是想添加另一个版本。
df <- data.frame(Sample = c(rep("m48", 6), rep("c48", 6)), Reporter = c(rep("VIC", 3), rep("FAM", 3), rep("VIC", 3), rep("FAM", 3)), CT = c(27.50595, 27.77835, 27.62321, 30.87295, 30.87967, 30.73427, 26.56715, 26.89787, 26.82587, 30.20642, 30.43074, 30.36933))
library(dplyr)
df %>% group_by(Sample) %>%
mutate(dCT = mean(CT[Reporter == 'FAM']) - mean(CT[Reporter == 'VIC']))
# Source: local data frame [12 x 4]
# Groups: Sample
#
# Sample Reporter CT dCT
# 1 m48 VIC 27.50595 3.193127
# 2 m48 VIC 27.77835 3.193127
# 3 m48 VIC 27.62321 3.193127
# 4 m48 FAM 30.87295 3.193127
# 5 m48 FAM 30.87967 3.193127
# 6 m48 FAM 30.73427 3.193127
# 7 c48 VIC 26.56715 3.571867
# 8 c48 VIC 26.89787 3.571867
# 9 c48 VIC 26.82587 3.571867
# 10 c48 FAM 30.20642 3.571867
# 11 c48 FAM 30.43074 3.571867
# 12 c48 FAM 30.36933 3.571867
仅仅因为我知道收到回复说“你做的不好,而不是这个”并不令人满意 - 这里有一些关于什么不适用于原版的说明码。 但请注意,我仍然建议使用其他解决方案之一。
length(dataframe)
没有按照您的想法执行操作:它返回列数,而不是行数。你想要的是nrow(dataframe)
。所以这里有一个适用的代码版本:
calc_dCT <- function(sample, DF){
sample_df <- DF[ which(DF$Sample==sample),]
VIC <- sample_df[ which(sample_df$Reporter=='VIC'),]
FAM <- sample_df[ which(sample_df$Reporter=='FAM'),]
VIC_mean<-mean(VIC[,3])
FAM_mean<-mean(FAM[,3])
DCT <- FAM_mean - VIC_mean
sample_df$dCT <- DCT
sample_df
}
dfnew <- data.frame(Sample=character(), Reporter=character(), CT=numeric(), dCT=numeric())
for (sample_name in unique(df$Sample))
dfnew <- rbind(dfnew, calc_dCT(sample_name, df))