我有一个看起来像(gwas.data)的数据框:
SNP CHR BP A1 A2 zscore P CEUmaf MAF
1 rs1000000 12 125456933 A G 1.441 0.1496 0.3729 0.2401
563090 rs10000010 4 21227772 T C 0.068 0.9455 0.575 0.4934
563091 rs10000023 4 95952929 T G 1.217 0.2236 0.5917 0.3852
563092 rs1000003 3 99825597 A G -0.306 0.7597 0.875 0.1794
563093 rs10000033 4 139819348 T C 1.050 0.2935 0.4917 0.4789
2 rs10000037 4 38600725 A G 0.072 0.9428 0.2833 0.2296
我有另一个看起来像(正确的方向):
CHR SNP A1 A2 MAF NCHROBS
6952148 12 rs1000000 A G 0.2401 758
2272221 4 rs10000010 C T 0.4934 758
2524810 4 rs10000023 G T 0.3852 758
1838654 3 rs1000003 G A 0.1794 758
2675630 4 rs10000033 C T 0.4789 758
2338861 4 rs10000037 A G 0.2296 758
如果A1和A2在两个数据帧之间切换,我试图纠正一个用(1-MAF)取代gwas.data $ MAF的程序。我试图在这里使用这行代码,我是从别人那里借来的:
flip <- gwas.data$A1 == correct.orientation$A2 & gwas.data$A2 == correct.orientation$A1
dont.flip <- gwas.data$A1 == correct.orientation$A1 & gwas.data$A2 == correct.orientation$A2
for ( i in 1 : nrow ( gwas.data ) ) {
if ( flip [ i ] ) {
gwas.data$A1 [ i ] <- correct.orientation$A1 [ i ]
gwas.data$A2 [ i ] <- correct.orientation$A2 [ i ]
gwas.data$zscore [ i ] <- - gwas.data$EFF [ i ]
gwas.data$MAF [ i ] <- 1 - gwas.data$FRQ [ i ]
} else if ( dont.flip [ i ] ) {
#do nothing
} else {
stop ( "Strand Issue")
}
我在第一行遇到错误flip <- gwas.data$A1 == correct.orientation$A2 & gwas.data$A2 == correct.orientation$A1
错误是
Error in Ops.factor(gwas.data$A1, correct.orientation$A2) : level sets of factors are different
如何解决此问题?
答案 0 :(得分:1)
考虑放弃使用for
循环并使用两个数据帧的基本R 函数。但是,需要进行一些数据管理:1)暂时将因子转换为字符(或使用stringAsFactors=FALSE
或read.csv()
中的read.table()
)和2)为重复列名添加后缀。计算完MAF并使用ifelse()
完成后,拆分合并的数据框并将列名和数据类型重置为原始结构:
# CONVERT FACTORS TO CHARACTER
gwas.data[, c("A1","A2")] <- sapply(gwas.data[,c("A1","A2")],as.character)
# SUFFIXING COL NAMES TO IDENTIFY IN MERGED DF
names(gwas.data) <- paste0(names(gwas.data), "_A")
# CONVERT FACTORS TO CHARACTER
correct.orientation[, c("A1","A2")] <- sapply(correct.orientation[,c("A1","A2")],as.character)
# SUFFIXING COL NAMES TO IDENTIFY IN MERGED DF
names(correct.orientation) <- paste0(names(correct.orientation ), "_B")
# MERGE DATA FRAMES (ASSUMING SNP IS UNIQUE IDENTIFIER)
comparedf <- merge(gwas.data, correct.orientation, by.x="SNP_A", by.y="SNP_B", all=TRUE)
# CALCULATE NEW MAF
comparedf$MAF_A <- ifelse(((comparedf$A1_A == comparedf$A2_B) &
(comparedf$A2_B == comparedf$A1_A)),
(1 - comparedf$MAF_A),
comparedf$MAF_A)
comparedf$zscore_A <- ifelse(((comparedf$A1_A == comparedf$A2_B) &
(comparedf$A2_B == comparedf$A1_A)),
-1 * comparedf$zscore_A,
comparedf$zscore_A)
# SPLIT MERGE BACK TO ORIGINAL STRUCTURE
newgwas.data <- comparedf[,names(gwas.data)]
# REMOVE SUFFIX
names(newgwas.data) <- gsub("_A", "", names(newgwas.data))
# RESET FACTORS
newgwas.data$A1 <- as.factor(newgwas.data$A1)
newgwas.data$A2 <- as.factor(newgwas.data$A2)