我的数据框中有近百万个对象。我需要一种有效的方法来基于多个标准对数据进行子集化。我可以这样做是一个for循环但是想知道是否有更优雅的方法来做到这一点。
Time Instance Server Metric Value
17/08/2014 04:00:00 PM ID1 Server888 disk.commandsaveraged.average 0
17/08/2014 04:00:00 PM ID1 Server999 disk.commandsaveraged.average 0
17/08/2014 04:00:00 PM ID1 Server777 disk.commandsaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server888 disk.commandsaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server999 disk.commandsaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server777 disk.commandsaveraged.average 0
17/08/2014 04:00:00 PM ID2 Server888 disk.commandsaveraged.average 0
17/08/2014 04:05:00 PM ID2 Server888 disk.commandsaveraged.average 0
17/08/2014 04:00:00 PM ID3 Server999 disk.commandsaveraged.average 0
17/08/2014 04:05:00 PM ID3 Server999 disk.commandsaveraged.average 0
17/08/2014 04:00:00 PM ID3 Server777 disk.commandsaveraged.average 0
17/08/2014 04:05:00 PM ID3 Server777 disk.commandsaveraged.average 0
17/08/2014 04:00:00 PM ID1 Server888 disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM ID1 Server999 disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM ID1 Server777 disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server888 disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server999 disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server777 disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM ID2 Server888 disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM ID2 Server888 disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM ID3 Server999 disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM ID3 Server999 disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM ID3 Server777 disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM ID3 Server777 disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM ID1 Server888 disk.numberwriteaveraged.average 0
17/08/2014 04:00:00 PM ID7 Server999 disk.numberwriteaveraged.average 0
17/08/2014 04:00:00 PM ID1 Server777 disk.numberwriteaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server888 disk.numberwriteaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server999 disk.numberwriteaveraged.average 0
17/08/2014 04:05:00 PM ID7 Server777 disk.numberwriteaveraged.average 0
17/08/2014 04:00:00 PM ID2 Server888 disk.numberwriteaveraged.average 0
17/08/2014 04:05:00 PM ID5 Server888 disk.numberwriteaveraged.average 0
17/08/2014 04:00:00 PM ID3 Server999 disk.numberwriteaveraged.average 0
17/08/2014 04:05:00 PM ID4 Server999 disk.numberwriteaveraged.average 0
17/08/2014 04:00:00 PM ID3 Server777 disk.numberwriteaveraged.average 0
17/08/2014 04:05:00 PM ID3 Server777 disk.numberwriteaveraged.average 0
我想要做的是创建一个子集metric == disk.numberwriteaveraged.average
,Server == Server999 & Server == Server888
和两个服务器具有相同的实例ID。
注意,我使用术语子集纯粹是因为我不知道有任何其他方法来过滤数据i R,仍在学习。我正在寻找速度,我将生成比现有数据集大得多的数据集。
答案 0 :(得分:2)
(如果我理解你的问题)在你的情况下,data.table
是你的朋友。尝试(假设df
是您的数据集):
library(data.table)
df2 <- setDT(df)[, .SD[Metric == "disk.commandsaveraged.average" &
(Server == "Server999" | Server == "Server888")], by = Instance]