使用匹配的数据框列运行R错误

时间:2015-11-28 00:11:48

标签: r dataframe

我有一个看起来像(gwas.data)的数据框:

              SNP CHR        BP A1 A2 zscore      P CEUmaf    MAF
1       rs1000000  12 125456933  A  G  1.441 0.1496 0.3729 0.2401
563090 rs10000010   4  21227772  T  C  0.068 0.9455  0.575 0.4934
563091 rs10000023   4  95952929  T  G  1.217 0.2236 0.5917 0.3852
563092  rs1000003   3  99825597  A  G -0.306 0.7597  0.875 0.1794
563093 rs10000033   4 139819348  T  C  1.050 0.2935 0.4917 0.4789
2      rs10000037   4  38600725  A  G  0.072 0.9428 0.2833 0.2296

我有另一个看起来像(正确的方向):

        CHR        SNP A1 A2    MAF NCHROBS
6952148  12  rs1000000  A  G 0.2401     758
2272221   4 rs10000010  C  T 0.4934     758
2524810   4 rs10000023  G  T 0.3852     758
1838654   3  rs1000003  G  A 0.1794     758
2675630   4 rs10000033  C  T 0.4789     758
2338861   4 rs10000037  A  G 0.2296     758

如果A1和A2在两个数据帧之间切换,我试图纠正一个用(1-MAF)取代gwas.data $ MAF的程序。我试图在这里使用这行代码,我是从别人那里借来的:

    flip <- gwas.data$A1 == correct.orientation$A2 & gwas.data$A2 == correct.orientation$A1
    dont.flip <- gwas.data$A1 == correct.orientation$A1 & gwas.data$A2 == correct.orientation$A2

    for ( i in 1 : nrow ( gwas.data ) ) {
        if ( flip [ i ] ) {
            gwas.data$A1 [ i ] <- correct.orientation$A1 [ i ]
            gwas.data$A2 [ i ] <- correct.orientation$A2 [ i ]
            gwas.data$zscore [ i ] <- - gwas.data$EFF [ i ]
            gwas.data$MAF [ i ] <- 1 - gwas.data$FRQ [ i ]
        } else if ( dont.flip [ i ] ) {
            #do nothing
        } else {
            stop ( "Strand Issue")      
        }

我在第一行遇到错误flip <- gwas.data$A1 == correct.orientation$A2 & gwas.data$A2 == correct.orientation$A1错误是 Error in Ops.factor(gwas.data$A1, correct.orientation$A2) : level sets of factors are different如何解决此问题?

1 个答案:

答案 0 :(得分:1)

考虑放弃使用for循环并使用两个数据帧的基本R enter image description here函数。但是,需要进行一些数据管理:1)暂时将因子转换为字符(或使用stringAsFactors=FALSEread.csv()中的read.table())和2)为重复列名添加后缀。计算完MAF并使用ifelse()完成后,拆分合并的数据框并将列名和数据类型重置为原始结构:

# CONVERT FACTORS TO CHARACTER
gwas.data[, c("A1","A2")] <- sapply(gwas.data[,c("A1","A2")],as.character)
# SUFFIXING COL NAMES TO IDENTIFY IN MERGED DF
names(gwas.data) <- paste0(names(gwas.data), "_A")

# CONVERT FACTORS TO CHARACTER
correct.orientation[, c("A1","A2")] <- sapply(correct.orientation[,c("A1","A2")],as.character)
# SUFFIXING COL NAMES TO IDENTIFY IN MERGED DF
names(correct.orientation) <- paste0(names(correct.orientation ), "_B")

# MERGE DATA FRAMES (ASSUMING SNP IS UNIQUE IDENTIFIER)
comparedf <- merge(gwas.data, correct.orientation, by.x="SNP_A", by.y="SNP_B", all=TRUE)

# CALCULATE NEW MAF
comparedf$MAF_A <- ifelse(((comparedf$A1_A == comparedf$A2_B) &
                           (comparedf$A2_B == comparedf$A1_A)), 
                          (1 - comparedf$MAF_A), 
                          comparedf$MAF_A)
comparedf$zscore_A <- ifelse(((comparedf$A1_A == comparedf$A2_B) &
                              (comparedf$A2_B == comparedf$A1_A)),   
                               -1 * comparedf$zscore_A, 
                               comparedf$zscore_A)

# SPLIT MERGE BACK TO ORIGINAL STRUCTURE
newgwas.data <- comparedf[,names(gwas.data)]
# REMOVE SUFFIX
names(newgwas.data) <- gsub("_A", "", names(newgwas.data))
# RESET FACTORS
newgwas.data$A1 <- as.factor(newgwas.data$A1)
newgwas.data$A2 <- as.factor(newgwas.data$A2)