用于群组分析的数据帧的R子集

时间:2017-03-29 21:26:01

标签: r

我在数据框df中有以下数据

persons  year
personA  2015
personB  2016
personC  2015
personB  2015

如何在R中使用子集函数来过滤2015年和2016年的personB 我正在使用以下代码,但不起作用

df1 <- subset(df, (year==2015 & year ==2016))

3 个答案:

答案 0 :(得分:1)

我使用dplyr,因为它比基地R容易得多。

library(dplyr)
df %>% group_by(persons) %>% filter(n() == 2)

这会按行对行进行分组,然后仅保留具有两个成员(两年)的组。

答案 1 :(得分:0)

df2 <- df[(df$year== 2015 | df$year== 2016),][1]

## get each person and the number of his appearence in the dataframe
t <- table(df2)
# 
# personA personB personC 
# 1       2       1 

t[t>1]
# personB 
# 2

数据框

df <- data.frame("persons" = c("personA","personB","personC","personB"),
 "year" = c(2015,2016,2015,2015))

修改

使用duplicated

的另一种解决方案
 duplicated(df$persons)
#[1] FALSE FALSE FALSE  TRUE
 df[duplicated(df$persons),1]
# personB

答案 2 :(得分:0)

使用data.table(和unique在同一年处理同一行的多行)的示例:

library(data.table)
dt <- structure(list(persons = c("personA", "personB", "personC", "personB"
), year = c(2015L, 2016L, 2015L, 2015L)), .Names = c("persons", 
"year"), row.names = c(NA, -4L), class = "data.frame")
setDT(dt)
years <- c("2015", "2016")
# Filter by years and make sure all rows are unique combinations of persons and
# thoese years. Then set in_all_years to TRUE of number of rows is equal to
# number of years
out <- unique(dt[year %in% years])[, in_all_years := .N == length(years),
  by = persons]

> out
   persons year in_all_years
1: personA 2015        FALSE
2: personB 2016         TRUE
3: personC 2015        FALSE
4: personB 2015         TRUE