如何根据相同的ID分割两个数据框col?

时间:2017-06-07 15:04:29

标签: r dataframe

我有两个不同长度的数据框如下:

> df1

V1         V2          V3      V4   V5      V6  V7  V8  V9  V10 V11 V12 V13 V14 V15
chr1    1572489     1575053     -   SSU72   182 464 288 437 521 646 490 213 411 535
chr1    231532572   231536297   -   TAF5L   28  62  28  48  12  41  31  32  19  22
chr1    97305156    97306558    -   DPYD    13  18  9   17  20  22  1   2   2   3
chr1    10380576    10386010    +   KIF1B   274 324 99  183 223 153 270 250 314 190


> df2

V1         V2           V3      V4  V5      V6  V7  V8  V9  V10 V11 V12 V13 V14 V15
chr1    1551810      1552304    -   SSU72   12  32  274 99  68  5   113 58  187 156
chr1    231526495   231528472   -   TAF5L   48  29  64  50  36  40  83  26  73  51
chr1    10453959    10459017    +   KIF1B   152 346 891 419 476 752 420 242 477 550
chr1    10455753    10459010    +   KIF1B   152 345 887 412 470 748 420 237 473 540
chr1    222514442   222516274   +   MARK1   2   6   5   6   2   10  2   6   3   14

我想根据V5中的名称创建一些比率。因此,例如,由于SSU72在两个数据帧中,我想像这样创建V6到V15的比率:

(df1$V6/df2$V6, df1$V7/df2$V7.....df1V15/df2$V15)

我试图创建一些循环,但我迷路了。

2 个答案:

答案 0 :(得分:1)

我们可以根据“V5”列执行merge,然后分别获取名称以.x.y结尾的列,并除以

dfN <- merge(df1[-(1:4)], df2[-(1:4)], by = "V5")
dfN1 <- cbind(dfN[1], dfN[grepl("\\.x", names(dfN))]/dfN[grepl("\\.y", names(dfN))])
names(dfN1) <- sub("\\.x$", "", names(dfN1))
dfN1
#V5         V6         V7        V8        V9       V10         V11       V12      V13       V14       V15
#1 KIF1B  1.8026316  0.9364162 0.1111111 0.4367542 0.4684874   0.2034574 0.6428571 1.033058 0.6582809 0.3454545
#2 KIF1B  1.8026316  0.9391304 0.1116122 0.4441748 0.4744681   0.2045455 0.6428571 1.054852 0.6638478 0.3518519
#3 SSU72 15.1666667 14.5000000 1.0510949 4.4141414 7.6617647 129.2000000 4.3362832 3.672414 2.1978610 3.4294872
#4 TAF5L  0.5833333  2.1379310 0.4375000 0.9600000 0.3333333   1.0250000 0.3734940 1.230769 0.2602740 0.4313725

答案 1 :(得分:1)

请:

df1 <- df1[df1$V5 %in% df2$V5,]
df2 <- df2[df2$V5 %in% df1$V5,]
df1 <- df1[order(df1$V5),]
df2 <- df2[order(df2$V5),]
df1$V6_ <- df1$V6/df2$V6