Question

我在数据框df中有以下数据

persons  year
personA  2015
personB  2016
personC  2015
personB  2015

如何在R中使用子集函数来过滤2015年和2016年的personB 我正在使用以下代码，但不起作用

df1 <- subset(df, (year==2015 & year ==2016))

Answer 1

我使用dplyr，因为它比基地R容易得多。

library(dplyr)
df %>% group_by(persons) %>% filter(n() == 2)

这会按行对行进行分组，然后仅保留具有两个成员（两年）的组。

Answer 2

df2 <- df[(df$year== 2015 | df$year== 2016),][1]

## get each person and the number of his appearence in the dataframe
t <- table(df2)
# 
# personA personB personC 
# 1       2       1 

t[t>1]
# personB 
# 2

数据框

df <- data.frame("persons" = c("personA","personB","personC","personB"),
 "year" = c(2015,2016,2015,2015))

修改

使用duplicated
的另一种解决方案
duplicated(df$persons) #[1] FALSE FALSE FALSE TRUE df[duplicated(df$persons),1] # personB

Answer 3

使用data.table（和unique在同一年处理同一行的多行）的示例：

library(data.table)
dt <- structure(list(persons = c("personA", "personB", "personC", "personB"
), year = c(2015L, 2016L, 2015L, 2015L)), .Names = c("persons", 
"year"), row.names = c(NA, -4L), class = "data.frame")
setDT(dt)
years <- c("2015", "2016")
# Filter by years and make sure all rows are unique combinations of persons and
# thoese years. Then set in_all_years to TRUE of number of rows is equal to
# number of years
out <- unique(dt[year %in% years])[, in_all_years := .N == length(years),
  by = persons]

> out
   persons year in_all_years
1: personA 2015        FALSE
2: personB 2016         TRUE
3: personC 2015        FALSE
4: personB 2015         TRUE

用于群组分析的数据帧的R子集

3 个答案: