特定列的data.table中每行的频率

时间:2018-04-09 11:29:03

标签: r data.table

我有以下data.table。

    dat <- structure(list(kmers = c("TTTTTTTTTTTT", "TCCATTCCATTC", "TTCCATTCCATT", 
"CCATTCCATTCC", "ATTCCATTCCAT", "CATTCCATTCCA", "TTTTATTATTTT", 
"AAAATTATAAAA", "AAGACAATTTCT", "AAAGACAATTTC"), counts = c(16361L, 
10090L, 9599L, 9021L, 8516L, 8325L, 5739L, 5642L, 5378L, 5326L
)), .Names = c("kmers", "counts"), class = c("data.table", "data.frame"
), row.names = c(NA, -10L), .internal.selfref = <pointer: 0x29f1d78>)

这是表格

           kmers counts
 1: TTTTTTTTTTTT  16361
 2: TCCATTCCATTC  10090
 3: TTCCATTCCATT   9599
 4: CCATTCCATTCC   9021
 5: ATTCCATTCCAT   8516
 6: CATTCCATTCCA   8325
 7: TTTTATTATTTT   5739
 8: AAAATTATAAAA   5642
 9: AAGACAATTTCT   5378
10: AAAGACAATTTC   5326

我想将列数除以所有计数的总和。对于数据帧我会做

total=sum(dat$counts)
freq <-  dat$counts/total

我如何为data.table做到这一点?每个kmers都是唯一的,所以我不希望在kmers列中有重复的值。

例如,对于第一行,它将是16361/sum(dat$counts)

1 个答案:

答案 0 :(得分:0)

或者使用普通的基本语法仍然有效:

dat$countProportion = dat$counts / sum(dat$counts)