我在数据框df中有以下数据
persons year
personA 2015
personB 2016
personC 2015
personB 2015
如何在R中使用子集函数来过滤2015年和2016年的personB 我正在使用以下代码,但不起作用
df1 <- subset(df, (year==2015 & year ==2016))
答案 0 :(得分:1)
我使用dplyr
,因为它比基地R容易得多。
library(dplyr)
df %>% group_by(persons) %>% filter(n() == 2)
这会按行对行进行分组,然后仅保留具有两个成员(两年)的组。
答案 1 :(得分:0)
df2 <- df[(df$year== 2015 | df$year== 2016),][1]
## get each person and the number of his appearence in the dataframe
t <- table(df2)
#
# personA personB personC
# 1 2 1
t[t>1]
# personB
# 2
数据框
df <- data.frame("persons" = c("personA","personB","personC","personB"),
"year" = c(2015,2016,2015,2015))
修改强>
使用duplicated
duplicated(df$persons)
#[1] FALSE FALSE FALSE TRUE
df[duplicated(df$persons),1]
# personB
答案 2 :(得分:0)
使用data.table
(和unique
在同一年处理同一行的多行)的示例:
library(data.table)
dt <- structure(list(persons = c("personA", "personB", "personC", "personB"
), year = c(2015L, 2016L, 2015L, 2015L)), .Names = c("persons",
"year"), row.names = c(NA, -4L), class = "data.frame")
setDT(dt)
years <- c("2015", "2016")
# Filter by years and make sure all rows are unique combinations of persons and
# thoese years. Then set in_all_years to TRUE of number of rows is equal to
# number of years
out <- unique(dt[year %in% years])[, in_all_years := .N == length(years),
by = persons]
> out
persons year in_all_years
1: personA 2015 FALSE
2: personB 2016 TRUE
3: personC 2015 FALSE
4: personB 2015 TRUE