我的数据框具有多个级别的race
和group
因子,以下是最小示例:
id race group
1 1 White 1
2 2 White 1
3 3 White 1
4 4 White 1
5 5 White 1
6 6 White 2
7 7 White 2
8 8 White 2
9 9 White 2
10 10 Black 1
11 11 Black 1
12 12 Black 1
13 13 Black 2
14 14 Black 2
15 15 Black 2
16 16 Black 2
17 17 Hispanic 1
18 18 Hispanic 1
19 19 Hispanic 1
20 20 Hispanic 1
21 21 Hispanic 1
22 22 Hispanic 2
23 23 Hispanic 2
24 24 Hispanic 2
25 25 Hispanic 2
我可以将每个race
级别与"White"
分组的单个数据帧子集,然后使用下面的函数按group
拆分数据。
filter.race <- function(x, y) { f <- subset(x, race == "White" | race == y)
f <- split(f, f$group)
f}
哪个返回:
filter.race(df, "Black")
$`1`
id race group
1 1 White 1
2 2 White 1
3 3 White 1
4 4 White 1
5 5 White 1
10 10 Black 1
11 11 Black 1
12 12 Black 1
$`2`
id race group
6 6 White 2
7 7 White 2
8 8 White 2
9 9 White 2
13 13 Black 2
14 14 Black 2
15 15 Black 2
16 16 Black 2
filter.race(df, "Hispanic")
$`1`
id race group
1 1 White 1
2 2 White 1
3 3 White 1
4 4 White 1
5 5 White 1
17 17 Hispanic 1
18 18 Hispanic 1
19 19 Hispanic 1
20 20 Hispanic 1
21 21 Hispanic 1
$`2`
id race group
6 6 White 2
7 7 White 2
8 8 White 2
9 9 White 2
22 22 Hispanic 2
23 23 Hispanic 2
24 24 Hispanic 2
25 25 Hispanic 2
但是,我试图找到一种在数据框的所有级别上应用此功能的方法,而不是多次单独指定y
。
样本数据:
dput(df)
structure(list(id = 1:25, race = structure(c(3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("Black", "Hispanic", "White"), class = "factor"),
group = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L)), .Names = c("id",
"race", "group"), class = "data.frame", row.names = c(NA, -25L
))
答案 0 :(得分:3)
请考虑by
(指向tapply
的面向对象的包装器),以便在最初和每次迭代{{1}中通过 race 和 group 将其子集化}每个组中的 White 。对于 White 组本身,rbind
对数据进行重复数据删除。
unique
答案 1 :(得分:1)
R的基本解决方案如下。
我已将函数名称更改为filter.races
,并带有多个“种族”。
filter.races <- function(x){
races <- unique(x[["race"]])
races <- as.character(races)
races <- races[races != "White"]
res <- lapply(races, function(r){
s <- subset(x, race %in% c("White", r))
split(s, s[["group"]])
})
unlist(res, recursive = FALSE)
}
filter.races(df)
答案 2 :(得分:0)
这是使用Map
进行此操作的另一种方法,方法是将"White"
和其他种族的数据分开。
white_df <- subset(df, df$race == "White")
rest_df <- subset(df, df$race != "White")
Map(function(x, y) lapply(split(y, y$race), function(p) rbind(x, p)),
split(white_df, white_df$group), split(rest_df, rest_df$group))
#`1`
#$`1`$Black
# id race group
#1 1 White 1
#2 2 White 1
#3 3 White 1
#4 4 White 1
#5 5 White 1
#10 10 Black 1
#11 11 Black 1
#12 12 Black 1
#$`1`$Hispanic
# id race group
#1 1 White 1
#2 2 White 1
#3 3 White 1
#4 4 White 1
#5 5 White 1
#17 17 Hispanic 1
#18 18 Hispanic 1
#19 19 Hispanic 1
#20 20 Hispanic 1
#21 21 Hispanic 1
#....