Question

头（covreage）

chr     Pos             Val
X       129271111       10
X       129271112       10
X       129271113       10
X       129271114       10
X       129271115       10
X       129271116       11
X       129271117       11
X       129271118       11
X       129271119       11
X       129271120       11
X       129271121       11
X       129271122       11
X       129271123       11
X       129271124       11
X       129271125       11
X       129271126       11
X       129271127       11
X       129271128       11
X       129271129       11
X       129271130       11
X       129271131       11
X       129271132       11
X       129271133       11

头（注释）

chr Region  start       end         Gene    status
X   Exon    129271053   129271110   AIFM1   NO
X   Exon    129270618   129270706   AIFM1   NO
X   Exon    129270020   129270160   AIFM1   NO
X   Exon    129267288   129267430   AIFM1   NO
X   Exon    129265650   129265774   AIFM1   NO
X   Exon    129263945   129264141   AIFM1   NO
X   Exon    129263532   129263603   AIFM1   NO
3   Exon    15643358    15643401    BTD NO
3   Exon    15676931    15677195    BTD NO
3   Exon    15683415    15683564    BTD NO

尝试在第一个文件中创建一个带有基因名称的新列，用于第二个位置的起点和终点之间的位置以及各自的基因名称。

covreage$Gene <- ifelse(covreage$chr == annotation$chr & covreage$pos >= annotation$start & covreage$pos <= annotation$end,annotation$Gene,"NA")

问题是第二个文件的范围是file1 pos的值，chr和位置应该在两个文件中都匹配。 chr可以有23个不同的值，Pos在所有不同的chr值中具有相似的值。 chr和原始位置一起成为独特的元素

上面的代码给出了这个错误

Warning messages:
1: In is.na(e1) | is.na(e2) :
  longer object length is not a multiple of shorter object length
2: In `==.default`(covreage$chr, annotation$chr) :
  longer object length is not a multiple of shorter object length
3: In covreage$pos >= annotation$start :
  longer object length is not a multiple of shorter object length
4: In covreage$pos <= annotation$end :
  longer object length is not a multiple of shorter object length

Answer 1

通过评估covreage$pos >= annotation$start之类的内容，您可以逐行比较两个data.frames，这不是您想要的。您想比较第一行中的几行与第二行中的一行，使用R不知道的某些分组规则。

你仍然得到一些输出，因为R一般会尝试根据需要回收元素：

> 1:6<c(2,6,6) [1] TRUE TRUE TRUE FALSE TRUE FALSE

> 1:5<c(2,6,6) [1] TRUE TRUE TRUE FALSE TRUE Warning message: In 1:5 < c(2, 6, 6) : longer object length is not a multiple of shorter object length

在第一种情况下，没有打印警告，因为元素被均匀地重复使用;在第二种情况下，这是不可能的（因为正如R所说，longer object length is not a multiple of shorter object length），所以会出现警告。

尽管回收在您提供的上下文中被视为错误，但R允许它，因为它在某些情况下可能有用。

根据R中其他数据帧的列添加一列（Annotate）一个数据帧

1 个答案: