对子集化数据和绘图数据的data.table操作

时间:2013-07-28 23:29:12

标签: r data.table

假设我有以下data.table:

DT<-data.table("src" = rep(1:3, each = 25), "info" = paste0("some_info_", rep(1:3, each = 25)), "analyte" = rep(letters[1:5], each = 5), "time" = rep(1:5, 5), "mass" = rnorm(75))
setkey(DT, analyte)
DT
    src        info analyte time         mass
 1:   1 some_info_1       a    1  0.155264082
 2:   1 some_info_1       a    2  1.084690475
 3:   1 some_info_1       a    3  0.049223594
 ...
73:   3 some_info_3       e    3 -1.663257082
74:   3 some_info_3       e    4  0.384154026
75:   3 some_info_3       e    5 -1.470772427

我想对这个数据表进行子集化并计算两个分析物的比例,输出一个data.table形式:src,info,time,ratio

到目前为止,我已经设法提出以下内容:

DT2 <- merge(DT["b", ][, list(src, info, time, "b_mass" = mass)], DT["e", ][, list(src, info, time, "e_mass" = mass)], by = c("src", "info", "time"))
DT2[, ratio := b_mass / e_mass]
DT2
    src        info time       b_mass      e_mass        ratio 
 1:   1 some_info_1    1  0.048748340  0.55771897  0.087406638
 2:   1 some_info_1    2  0.580088770  0.01919104 30.227067059 
 3:   1 some_info_1    3  3.448766176  1.31206449  2.628503555
 4:   1 some_info_1    4  0.141492439  0.18760652  0.754197859
 5:   1 some_info_1    5 -1.298840808  0.28843784 -4.503018083
 6:   2 some_info_2    1 -0.999177587  0.82712712 -1.208009708
 7:   2 some_info_2    2 -1.371567195 -0.05886506 23.300190728
 8:   2 some_info_2    3 -0.577922327 -0.38846122  1.487722077
 9:   2 some_info_2    4  0.519693729 -0.54861732 -0.947279120
10:   2 some_info_2    5  0.002534584  1.07951446  0.002347892
11:   3 some_info_3    1  0.449420691  0.62283878  0.721568248
12:   3 some_info_3    2 -1.014887775 -0.08623683 11.768611151
13:   3 some_info_3    3 -0.990034334  0.20649892 -4.794380280
14:   3 some_info_3    4  0.935628916 -0.71373423 -1.310892595
15:   3 some_info_3    5 -2.146412498 -0.28746621  7.466660092

是否有更有效的方法来计算每个来源的两种分析物的比例?

此外,在通过data.table绘制数据时,有没有办法使用颜色矢量?例如,

plot(0, 0, type = "n", xlim = range(DT2$time), ylim = range(DT2$ratio), xlab = "Time", ylab = "Ratio b/e")
DT2[, lines(time, ratio, col = 1:3), by = src]

谢谢!

JMB

1 个答案:

答案 0 :(得分:2)

这将循环计算所有适当的比率

 for(kk in unique(DT[['analyte']])){ 
   DT[, (paste('ratio',kk,sep='.')) := mass / DT[force(kk)][['mass']]]
 }

# create the plot

DT['b', plot(x=time,y= ratio.e,type='n')]

# add the lines (creating a colour vector `cc` first`
cc <- c('red','green','blue')
DT['b', lines(x = time,y = ratio.e, col = cc[src]), by = src]

enter image description here

或者,使用ggplot2

library(ggplot2)
ggplot(DT['b'], aes(x = time,y=ratio.e, colour = factor(src))) + geom_line()

enter image description here