我有两个不同长度的数据框如下:
> df1
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
chr1 1572489 1575053 - SSU72 182 464 288 437 521 646 490 213 411 535
chr1 231532572 231536297 - TAF5L 28 62 28 48 12 41 31 32 19 22
chr1 97305156 97306558 - DPYD 13 18 9 17 20 22 1 2 2 3
chr1 10380576 10386010 + KIF1B 274 324 99 183 223 153 270 250 314 190
> df2
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
chr1 1551810 1552304 - SSU72 12 32 274 99 68 5 113 58 187 156
chr1 231526495 231528472 - TAF5L 48 29 64 50 36 40 83 26 73 51
chr1 10453959 10459017 + KIF1B 152 346 891 419 476 752 420 242 477 550
chr1 10455753 10459010 + KIF1B 152 345 887 412 470 748 420 237 473 540
chr1 222514442 222516274 + MARK1 2 6 5 6 2 10 2 6 3 14
我想根据V5中的名称创建一些比率。因此,例如,由于SSU72在两个数据帧中,我想像这样创建V6到V15的比率:
(df1$V6/df2$V6, df1$V7/df2$V7.....df1V15/df2$V15)
我试图创建一些循环,但我迷路了。
答案 0 :(得分:1)
我们可以根据“V5”列执行merge
,然后分别获取名称以.x
和.y
结尾的列,并除以
dfN <- merge(df1[-(1:4)], df2[-(1:4)], by = "V5")
dfN1 <- cbind(dfN[1], dfN[grepl("\\.x", names(dfN))]/dfN[grepl("\\.y", names(dfN))])
names(dfN1) <- sub("\\.x$", "", names(dfN1))
dfN1
#V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
#1 KIF1B 1.8026316 0.9364162 0.1111111 0.4367542 0.4684874 0.2034574 0.6428571 1.033058 0.6582809 0.3454545
#2 KIF1B 1.8026316 0.9391304 0.1116122 0.4441748 0.4744681 0.2045455 0.6428571 1.054852 0.6638478 0.3518519
#3 SSU72 15.1666667 14.5000000 1.0510949 4.4141414 7.6617647 129.2000000 4.3362832 3.672414 2.1978610 3.4294872
#4 TAF5L 0.5833333 2.1379310 0.4375000 0.9600000 0.3333333 1.0250000 0.3734940 1.230769 0.2602740 0.4313725
答案 1 :(得分:1)
请:
df1 <- df1[df1$V5 %in% df2$V5,]
df2 <- df2[df2$V5 %in% df1$V5,]
df1 <- df1[order(df1$V5),]
df2 <- df2[order(df2$V5),]
df1$V6_ <- df1$V6/df2$V6