我有一个包含站点、家庭和访问次数 (visit_no) 列的数据框。这些站点的访问次数不同(2 或 3),每次访问和每个站点记录的家庭也不同:
site family visit_no
1 A1 Scarabaeidae 1
2 A1 Clambidae 1
3 A1 Carabidae 1
4 A1 Carabidae 2
5 A1 Clambidae 2
6 A1 Scarabaeidae 2
7 A1 Leiodidae 3
8 A1 Clambidae 3
9 A1 Carabidae 3
10 A2 Scarabaeidae 1
11 A2 Carabidae 1
12 A2 Staphylinidae 1
13 A2 Curculionidae 2
14 A2 Scarabaeidae 2
15 A2 Staphylinidae 2
16 A3 Staphylinidae 1
17 A3 Carabidae 1
18 A3 Curculionidae 1
19 A3 Leiodidae 2
20 A3 Clambidae 2
21 A3 Carbidae 2
22 A3 Phalacridae 3
23 A3 Carabidae 3
24 A3 Curculionidae 3
我想填充一个数据框,记录每个站点的每次访问是金龟科是存在 (1) 还是不存在 (0)。如果某个站点的访问次数低于其他站点,我希望访问次数列记录 NA。这看起来像这样:
site 1 2 3
1 A1 1 1 0
2 A2 1 1 NA
3 A3 0 0 0
有没有办法使用条件循环来做到这一点?
答案 0 :(得分:2)
您只能使用 pivot_wider
-
tidyr::pivot_wider(df, names_from = visit_no, values_from = family,
values_fn = function(x) as.integer("Scarabaeidae" %in% x))
# site `1` `2` `3`
# <chr> <int> <int> <int>
#1 A1 1 1 0
#2 A2 1 1 NA
#3 A3 0 0 0
数据
df <- structure(list(site = c("A1", "A1", "A1", "A1", "A1", "A1", "A1",
"A1", "A1", "A2", "A2", "A2", "A2", "A2", "A2", "A3", "A3", "A3",
"A3", "A3", "A3", "A3", "A3", "A3"), family = c("Scarabaeidae",
"Clambidae", "Carabidae", "Carabidae", "Clambidae", "Scarabaeidae",
"Leiodidae", "Clambidae", "Carabidae", "Scarabaeidae", "Carabidae",
"Staphylinidae", "Curculionidae", "Scarabaeidae", "Staphylinidae",
"Staphylinidae", "Carabidae", "Curculionidae", "Leiodidae", "Clambidae",
"Carbidae", "Phalacridae", "Carabidae", "Curculionidae"), visit_no = c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L, 3L)), class = "data.frame", row.names = c(NA, -24L))
答案 1 :(得分:1)
通常,这种事情不需要使用循环。将数据透视为长格式,进行计算,然后再次将其透视为宽格式:
library(magrittr)
dat %>%
dplyr::group_by(site, visit_no) %>%
dplyr::summarise(a = dplyr::if_else("Scarabaeidae" %in% family, 1, 0)) %>%
tidyr::pivot_wider(names_from="visit_no", values_from="a")
# A tibble: 3 x 4
# Groups: site [3]
site `1` `2` `3`
<chr> <dbl> <dbl> <dbl>
1 A1 1 1 0
2 A2 1 1 NA
3 A3 0 0 0
答案 2 :(得分:0)
您可以使用 tapply()
在一行中完成此操作。
with(dat, tapply(I(family == 'Scarabaeidae'), list(site, visit_no), sum))
# 1 2 3
# A1 1 1 0
# A2 1 1 NA
# A3 0 0 0
数据:
dat <- read.table(header=T, text=' site family visit_no
1 A1 Scarabaeidae 1
2 A1 Clambidae 1
3 A1 Carabidae 1
4 A1 Carabidae 2
5 A1 Clambidae 2
6 A1 Scarabaeidae 2
7 A1 Leiodidae 3
8 A1 Clambidae 3
9 A1 Carabidae 3
10 A2 Scarabaeidae 1
11 A2 Carabidae 1
12 A2 Staphylinidae 1
13 A2 Curculionidae 2
14 A2 Scarabaeidae 2
15 A2 Staphylinidae 2
16 A3 Staphylinidae 1
17 A3 Carabidae 1
18 A3 Curculionidae 1
19 A3 Leiodidae 2
20 A3 Clambidae 2
21 A3 Carbidae 2
22 A3 Phalacridae 3
23 A3 Carabidae 3
24 A3 Curculionidae 3')