data.frame的复杂子集

时间:2014-08-21 20:32:00

标签: r

我的数据框中有近百万个对象。我需要一种有效的方法来基于多个标准对数据进行子集化。我可以这样做是一个for循环但是想知道是否有更优雅的方法来做到这一点。

Time    Instance    Server  Metric  Value
17/08/2014 04:00:00 PM  ID1 Server888   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID1 Server999   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID1 Server777   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID1 Server888   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID1 Server999   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID1 Server777   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID2 Server888   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID2 Server888   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID3 Server999   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID3 Server999   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID3 Server777   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID3 Server777   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID1 Server888   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID1 Server999   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID1 Server777   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID1 Server888   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID1 Server999   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID1 Server777   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID2 Server888   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID2 Server888   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID3 Server999   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID3 Server999   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID3 Server777   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID3 Server777   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID1 Server888   disk.numberwriteaveraged.average    0
17/08/2014 04:00:00 PM  ID7 Server999   disk.numberwriteaveraged.average    0
17/08/2014 04:00:00 PM  ID1 Server777   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID1 Server888   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID1 Server999   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID7 Server777   disk.numberwriteaveraged.average    0
17/08/2014 04:00:00 PM  ID2 Server888   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID5 Server888   disk.numberwriteaveraged.average    0
17/08/2014 04:00:00 PM  ID3 Server999   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID4 Server999   disk.numberwriteaveraged.average    0
17/08/2014 04:00:00 PM  ID3 Server777   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID3 Server777   disk.numberwriteaveraged.average    0

我想要做的是创建一个子集metric == disk.numberwriteaveraged.averageServer == Server999 & Server == Server888和两个服务器具有相同的实例ID。

注意,我使用术语子集纯粹是因为我不知道有任何其他方法来过滤数据i R,仍在学习。我正在寻找速度,我将生成比现有数据集大得多的数据集。

1 个答案:

答案 0 :(得分:2)

(如果我理解你的问题)在你的情况下,data.table是你的朋友。尝试(假设df是您的数据集):

library(data.table)
df2 <- setDT(df)[, .SD[Metric == "disk.commandsaveraged.average" & 
            (Server == "Server999" | Server == "Server888")], by = Instance]