如何根据每个唯一条目的计数行进行子集化

时间:2015-12-10 14:41:35

标签: r

我有一个像这样的数据:

 Hostname                Date                 CPU       
 Server01  2015-11-02 00:00:53                54 
 Server01  2015-11-02 00:15:53                54
 Server01  2015-11-02 00:30:53                54 
 Server02  2015-11-02 00:45:53                54 
 Server02  2015-11-02 01:00:53                54 

在Hostname下,有许多不同的服务器。我需要确保每台服务器的行数大于2并得到最终的df。

是否有一种简单的方法来对df进行子集化?

4 个答案:

答案 0 :(得分:1)

你可以base R

x = df$Hostname
df[is.element(x, names(table(x))[table(x)>2]),]

数据:

df = structure(list(Hostname = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("Server01", 
"Server02"), class = "factor"), Date = structure(1:5, .Label = c("2015-11-02 00:00:53", 
"2015-11-02 00:15:53", "2015-11-02 00:30:53", "2015-11-02 00:45:53", 
"2015-11-02 01:00:53"), class = "factor"), CPU = c(54L, 54L, 
54L, 54L, 54L)), .Names = c("Hostname", "Date", "CPU"), class = "data.frame", row.names = c(NA, 
-5L))

答案 1 :(得分:1)

使用dplyr的另一种方法:

library(dplyr)
df %>% group_by(Hostname) %>% filter(n() > 2)

答案 2 :(得分:1)

你也可以使用data.table(在base-R,dplyr和data.table中完成ansers:

library(data.table)

setDT(dat)[,N:=.N,by=Hostname][N>2,]

我使用N:=。N而不是.N,否则数据会被聚合。

答案 3 :(得分:0)

base R的另一个变体:

df[ave(df$CPU, df$Hostname, FUN=length)>2,]

数据:

df = structure(list(Hostname = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("Server01", 
"Server02"), class = "factor"), Date = structure(1:5, .Label = c("2015-11-02 00:00:53", 
"2015-11-02 00:15:53", "2015-11-02 00:30:53", "2015-11-02 00:45:53", 
"2015-11-02 01:00:53"), class = "factor"), CPU = c(54L, 54L, 
54L, 54L, 54L)), .Names = c("Hostname", "Date", "CPU"), class = "data.frame", row.names = c(NA, 
-5L))