Question

我在R中有一个包含10,000列和大约4,000行的数据框。数据是ID。例如，ID看起来像（rs100987，rs1803920等）。每个rsID＃具有0-3之间的相应iHS分数。我有一个单独的数据框，其中所有可能的rs＃存在于一列中，其相应的iHS分数在下一列中。我想用rsID替换10,000 x 4,000数据帧到10,000 x 4,000数据帧以及相应的iHS分数。我该怎么做？

这就是我现在的文件：

input ID     match 1    match 2     match 3 ......
rs6708       rs10089   rs100098    rs10567
rs8902       rs18079   rs234058    rs123098
rs9076       rs77890   rs445067    rs105023

This is what my iHS score file looks like (it has matching scores for every ID in the above file

snpID     iHS
rs6708    1.23
rs105023   0.92
rs234058  2.31
rs77890   0.31

I would like my output to look like 

match 1   match 2   match 3
0.89      0.34      2.45
1.18      2.31      0.67
0.31      1.54      0.92

Answer 1

让我们考虑一个小例子：

(dat <- data.frame(id1 = c("rs100987", "rs1803920"), id2=c("rs123", "rs456"), stringsAsFactors=FALSE))
#         id1   id2
# 1  rs100987 rs123
# 2 rs1803920 rs456
(dat2 <- data.frame(id=c("rs123", "rs456", "rs100987", "rs1803920", "rs123456"),
                   score=5:1, stringsAsFactors=FALSE))
#          id score
# 1     rs123     5
# 2     rs456     4
# 3  rs100987     3
# 4 rs1803920     2
# 5  rs123456     1

然后您可以使用以下方法执行此操作：

apply(dat, 2, function(x) dat2$score[match(x, dat2$id)])
#      id1 id2
# [1,]   3   5
# [2,]   2   4

对match的调用会在dat2中找出与您列中每个ID对应的行。

用R中的列替换匹配数字的字符串

1 个答案: