我有一个像这样的数据:
Hostname Date CPU
Server01 2015-11-02 00:00:53 54
Server01 2015-11-02 00:15:53 54
Server01 2015-11-02 00:30:53 54
Server02 2015-11-02 00:45:53 54
Server02 2015-11-02 01:00:53 54
在Hostname下,有许多不同的服务器。我需要确保每台服务器的行数大于2并得到最终的df。
是否有一种简单的方法来对df进行子集化?
答案 0 :(得分:1)
你可以base R
:
x = df$Hostname
df[is.element(x, names(table(x))[table(x)>2]),]
数据:强>
df = structure(list(Hostname = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("Server01",
"Server02"), class = "factor"), Date = structure(1:5, .Label = c("2015-11-02 00:00:53",
"2015-11-02 00:15:53", "2015-11-02 00:30:53", "2015-11-02 00:45:53",
"2015-11-02 01:00:53"), class = "factor"), CPU = c(54L, 54L,
54L, 54L, 54L)), .Names = c("Hostname", "Date", "CPU"), class = "data.frame", row.names = c(NA,
-5L))
答案 1 :(得分:1)
使用dplyr
的另一种方法:
library(dplyr)
df %>% group_by(Hostname) %>% filter(n() > 2)
答案 2 :(得分:1)
你也可以使用data.table(在base-R,dplyr和data.table中完成ansers:
library(data.table)
setDT(dat)[,N:=.N,by=Hostname][N>2,]
我使用N:=。N而不是.N,否则数据会被聚合。
答案 3 :(得分:0)
base R
的另一个变体:
df[ave(df$CPU, df$Hostname, FUN=length)>2,]
数据:强>
df = structure(list(Hostname = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("Server01",
"Server02"), class = "factor"), Date = structure(1:5, .Label = c("2015-11-02 00:00:53",
"2015-11-02 00:15:53", "2015-11-02 00:30:53", "2015-11-02 00:45:53",
"2015-11-02 01:00:53"), class = "factor"), CPU = c(54L, 54L,
54L, 54L, 54L)), .Names = c("Hostname", "Date", "CPU"), class = "data.frame", row.names = c(NA,
-5L))