Question

我的数据与此类似：

   Sample.Name Marker Height Size
1:    Sample01      A    450  100
2:    Sample01      A    420  120
3:    Sample01      B    700  140
4:    Sample01      C    750  160
5:    Sample01      D    300  180
6:    Sample01      D    340  200

可以使用以下代码复制：

# Some example data.
require(data.table)
DT <- data.table(Sample.Name=rep("Sample01", 6),
             Marker=c("A","A","B","C","D","D"),
             Height=c(450,420,700,750,300,340),
             Size=c(seq(from=100, to=200,length.out = 6)))

每个标记有一行或两行，高度和大小（可以是NA）。实际上，还存在具有等位基因值的其他列以及该示例不需要的其他信息。数据不一定按尺寸排序。

我想计算每个标记的峰高之间的比率（Hb）（如果只有一个峰，则为NA）。 Hb可以通过几种方式计算：

1）较小（即较低）的峰高除以较大（即较高）的峰高

2）较短片段的峰高除以较长片段的峰高

3）与2）相反，但可以用2）相同的策略解决，所以我们不需要在这里考虑它。

我正在编写一个能够使用data.table执行所有三个计算的函数。到目前为止，我已经编写了代码来计算1）使用两步法：

# Identify the smaller and larger peak height and count number of peaks.
DT2 <- DT[, list(Small=min(Height), Large=max(Height), Peaks=.N),
      by=list(Sample.Name, Marker)]

# Divide only where there are two observed peaks.
DT2[Peaks==2, Hb:=Small/Large, by=list(Sample.Name, Marker)]

这会生成所需的输出：

>DT2
   Sample.Name Marker Small Large Peaks        Hb
1:    Sample01      A   420   450     2 0.9333333
2:    Sample01      B   700   700     1        NA
3:    Sample01      C   750   750     1        NA
4:    Sample01      D   300   340     2 0.8823529

然而，我坚持如何计算2）。我必须查看大小，以确定要分配给＆＃39; Short＆＃39;的两个高度值中的哪一个。和＆＃39; Long＆＃39;分别。我咨询了data.table帮助页面和搜索stackoverflow。远离data.table语法专家我一直无法找到/认识到这个特定问题的解决方案。 2）的期望输出与1）的期望输出相同，但第一行除外 Hb 将为450/420 = 1.071429

Answer 1

对于第二次计算，您可以执行以下操作：

DT[, .(Hb = ifelse(.N == 2, Height[Size == min(Size)]/Height[Size == max(Size)], NA_real_))
   , .(Sample.Name, Marker)]     # where you pick up the Height at the smaller size divided 
                                 # by the Height at the larger size. Note that you have to 
                                 # explicitly specify the NA type to be real here since data.table
                                 # requires column type to be consistent

#    Sample.Name Marker        Hb
# 1:    Sample01      A 1.0714286
# 2:    Sample01      B        NA
# 3:    Sample01      C        NA
# 4:    Sample01      D 0.8823529

R data.table计算以其他列中的多行为条件

1 个答案: