我有一个这样的数据框:
df <- data.frame(
names = c(rep("cody", 10), rep("sam", 5)),
year = c(paste0("year",2000:2009), paste0("year",2000:2004))
)
我想获得这样的结果输出:
df2 <- data.frame(
names = c(rep("cody", 5), rep("sam", 5)),
year = c(paste0("year",2000:2004), paste0("year",2000:2004))
)
有什么想法吗?
答案 0 :(得分:1)
以下是包含Reduce
和intersect
的基本R方法。
dat[dat$year == Reduce(intersect, split(dat$year, dat$names)),]
返回
names year
1 cody year2000
2 cody year2001
3 cody year2002
4 cody year2003
5 cody year2004
11 sam year2000
12 sam year2001
13 sam year2002
14 sam year2003
15 sam year2004
在这里,我们使用Reduce
重复提供参数(使用split
作为列表提供的每个名称的不同年份)到intersect
,这消除了“不匹配”的年份,直到你最终只有那些可用于所有名字的年份。
请注意,年变量必须是字符向量,而不是因子变量。
作为次要简化,您可以使用with
来减少dat$
引用:
dat[with(dat, year == Reduce(intersect, split(year, names))),]
数据强>
dat <-
structure(list(names = c("cody", "cody", "cody", "cody", "cody",
"cody", "cody", "cody", "cody", "cody", "sam", "sam", "sam",
"sam", "sam"), year = c("year2000", "year2001", "year2002", "year2003",
"year2004", "year2005", "year2006", "year2007", "year2008", "year2009",
"year2000", "year2001", "year2002", "year2003", "year2004")),
.Names = c("names", "year"), row.names = c(NA, -15L), class = "data.frame")
答案 1 :(得分:0)
您可以按年份进行分组,然后过滤那些出现两次的年份(或者您想要的许多唯一名称):
library(dplyr)
df %>%
group_by(year) %>%
mutate(name_count = n()) %>%
ungroup() %>%
filter(name_count == 2) %>%
select(-name_count)
names year
<fct> <fct>
1 cody year2000
2 cody year2001
3 cody year2002
4 cody year2003
5 cody year2004
6 sam year2000
7 sam year2001
8 sam year2002
9 sam year2003
10 sam year2004
答案 2 :(得分:0)
以下是查找year
列中所有重复项的选项。
df[duplicated(df$year) | duplicated(df$year, fromLast = TRUE), ]
# names year
# 1 cody year2000
# 2 cody year2001
# 3 cody year2002
# 4 cody year2003
# 5 cody year2004
# 11 sam year2000
# 12 sam year2001
# 13 sam year2002
# 14 sam year2003
# 15 sam year2004