I need some help for a technical manipulation on R plz.
My problem : I have some observation data of bird by presence/absence in differents habitat types. I want know the sucess ratio of observation in these differents habitats according to their surface range :
data_observation <- data.frame(
habitat_bush = c(
0, 0, 0, 0, 10,
10, 30, 30, 30, 45,
65, 65, 65, 80, 80,
80, 90, 95, 100
),
obs = c(
"yes", "no", "no", "no", "yes",
"no", "no", "yes", "no", "yes",
"yes", "no", "yes", "no", "yes",
"yes", "yes", "yes", "yes"
)
)
Here you have just data for 'habitat_bush" but in have 10 more time habitats.
Help by a colleague, we have made this function to make a ggplot of the ratio success of observation under differents area size of 'habitat_bush" :
library(dplyr)
library(ggplot2)
library(scales)
plot_forest_test <- function(data = NULL, habitat_type = NULL, colour = NULL) {
x <- enquo(habitat_type)
fill <- enquo(colour)
ggdata <- data %>%
select(x = !!x, fill = !!fill) %>%
mutate(
group = case_when(
x == 0 ~ "[0]",
x > 0.0001 & x < 10.0001 ~ "]0-10]",
x > 10.0001 & x < 25.0001 ~ "]10-25]",
x > 25.0001 & x < 50.0001 ~ "]25-50]",
x > 50.0001 & x < 75.0001 ~ "]50-75]",
x > 75.0001 ~ "]75- 100]"
)
) %>%
select(-x) %>%
group_by(group, fill) %>%
count() %>%
group_by(group) %>%
group_modify(~ mutate(.data = .x, freq = n / sum(n)))
ggplot(data = ggdata, mapping = aes(x = group, y = freq, fill = fill)) +
geom_bar(stat = "identity") +
scale_fill_brewer(palette = "Greens") +
scale_y_continuous(labels = scales::percent) +
theme_minimal() +
labs(x = expr(!!x), fill = expr(!!fill))
}
plot_forest_test(data = data_observation, habitat_type = habitat_bush, colour = obs)
It's work very well. But the observation can depend of effort put by technicien to looking for the presence of bird. So, I have data like that :
data_observation_2 <- data.frame(
superficie_essence = c(
0, 0, 0, 0, 10,
10, 30, 30, 30, 45,
65, 65, 65, 80, 80,
80, 90, 95, 100
),
obs = c(
"yes", "no", "no", "no", "yes",
"no", "no", "yes", "no", "yes",
"yes", "no", "yes", "no", "yes",
"yes", "yes", "yes", "yes"
),
effort = c(low, low, mid-low, mid-low, low, mid-low, mid-low,
mid-high, mid-high, high, mid-low, mid-low, mid-high, mid-low, mid-high, high, high, mid-high, high)
)
My R skills stop here. I want have the same previously graph but subdivided by effort_type for each modalities of habitats types, in the same graphical (like multipanel graphical). In other word I want 5 sub-graph of previous graph with 1 barplot by efforts modalities. But I have lot of data, so I would like put this processu into a function like :
plot_forest_test_2(data = data_observation, habitat_type = habitat_bush, effort = Q_effort, colour = obs)
Can you help me please ? Thanks for your help !
cdlt
答案 0 :(得分:1)
不确定性不是我的专长,尤其是当它们可能会丢失但可以尝试一下时。我为多面项目创建了一个新列,然后添加facet_wrap()
。您也可以使用facet_grid()
。希望对您有所帮助。
plot_forest_test <- function(data = NULL, habitat_type = NULL, colour = NULL, facet = NULL) {
x <- enquo(habitat_type)
fill <- enquo(colour)
# this is new ####################
facet <- enquo(facet)
has_facet <- quo_name(facet) != "NULL"
df <-
data %>%
mutate(
x = !!x,
fill = !!fill,
facet = ""
)
if (has_facet) {
df <-
df %>%
mutate(facet = !!facet)
}
##################################
ggdata <-
df %>%
mutate(
group = case_when(
x == 0 ~ "[0]",
x > 0.0001 & x < 10.0001 ~ "]0-10]",
x > 10.0001 & x < 25.0001 ~ "]10-25]",
x > 25.0001 & x < 50.0001 ~ "]25-50]",
x > 50.0001 & x < 75.0001 ~ "]50-75]",
x > 75.0001 ~ "]75- 100]"
)
) %>%
select(-x) %>%
# adding facet here
group_by(group, fill, facet) %>%
count() %>%
group_by(group, facet) %>%
arrange(desc(fill)) %>%
mutate(
freq = n/sum(n),
# these steps set up the label placement
running_freq = cumsum(freq),
prev_freq = lag(running_freq, default = 0),
label_y = (prev_freq + running_freq)/2
) %>%
ungroup()
# create plot w/o facet
p <-
ggplot(data = ggdata, mapping = aes(x = group, y = freq, fill = fill)) +
geom_bar(stat = "identity") +
geom_text(aes(y = label_y, label = n)) +
scale_fill_brewer(palette = "Greens") +
scale_y_continuous(labels = scales::percent) +
theme(
panel.background = element_rect(fill = "white"),
panel.border = element_rect(color = "grey90", fill = NA)
) +
labs(x = expr(!!x), fill = expr(!!fill))
# add in if facet was mentioned
if (has_facet) {
p <-
p +
facet_grid(~facet)
}
# return final plot
p
}
我要对data_observation_2
进行编辑,因为字符串不在引号中,并且某些值在连字符周围有空格,而其他值则没有。我使它们全都没有空间
data_observation_2 <- data.frame(
superficie_essence = c(
0, 0, 0, 0, 10,
10, 30, 30, 30, 45,
65, 65, 65, 80, 80,
80, 90, 95, 100
),
obs = c(
"yes", "no", "no", "no", "yes",
"no", "no", "yes", "no", "yes",
"yes", "no", "yes", "no", "yes",
"yes", "yes", "yes", "yes"
),
effort = c(
"low", "low", "mid-low", "mid-low", "low", "mid-low", "mid-low",
"mid-high", "mid-high", "high", "mid-low", "mid-low",
"mid-high", "mid-low", "mid-high", "high", "high", "mid-high", "high"
)
)
)
最后的结果。我用fct_relevel()
来按顺序排列它们。
plot_forest_test(
data = data_observation,
habitat_type = habitat_bush,
colour = obs
)
data_observation_2 %>%
mutate(effort = fct_relevel(effort, "low", "mid-low", "mid-high", "high")) %>%
plot_forest_test(
habitat_type = superficie_essence,
colour = obs,
facet = effort
)