我有一个看起来像这样的数据框:
# A tibble: 5 x 5
# Groups: Trial [1]
GID Trial pop `1A-1145442` `1A-1158042`
<chr> <chr> <chr> <int> <int>
GID421213 ES1 ES1-5 12 11
GID419903 ES1 ES1-5 22 12
GID3881 ES1 ES1-5 22 22
GID13646 ES1 ES1-5 12 12
GID418846 ES1 ES1-5 22 11
这是其中的dput
:
structure(list(GID = c("GID421213", "GID419903", "GID3881", "GID13646",
"GID418846"), Trial = c("ES1", "ES1", "ES1", "ES1", "ES1"), pop = c("ES1-5",
"ES1-5", "ES1-5", "ES1-5", "ES1-5"), `1A-1145442` = c(12L, 22L,
22L, 12L, 22L), `1A-1158042` = c(11L, 12L, 22L, 12L, 11L)), row.names =
c(NA, -5L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), vars =
"Trial", drop = TRUE, indices = list(0:4), group_sizes = 5L,
biggest_group_size = 5L, labels = structure(list(Trial = "ES1"), row.names
= c(NA, -1L), class = "data.frame", vars = "Trial", drop = TRUE))
我希望像以前使用regex
操作但现在使用dplyr
使用pop列一样,将Trial列重新组合为新列。 “试验”列包含从1到38的ES值:我想使用dplyr
包以这种方式将ES1-3,ES3-6,ES7-9分组。我知道我可以从df >%> group_by(df,Trial)
开始,但是从那以后我不知道如何操作。
答案 0 :(得分:1)
library(dplyr)
df %>%
mutate(pop2 = case_when(
Trial == "ES1" | Trial == "ES2" | Trial == "ES3" ~ "ES1-3",
Trial == "ES4" | Trial == "ES5" | Trial == "ES6" ~ "ES4-6"
))
会回来
# A tibble: 5 x 6
# Groups: Trial [1]
GID Trial pop `1A-1145442` `1A-1158042` pop2
<chr> <chr> <chr> <int> <int> <chr>
1 GID421213 ES1 ES1-5 12 11 ES1-3
2 GID419903 ES1 ES1-5 22 12 ES1-3
3 GID3881 ES1 ES1-5 22 22 ES1-3
4 GID13646 ES1 ES1-5 12 12 ES1-3
5 GID418846 ES1 ES1-5 22 11 ES1-3
答案 1 :(得分:1)
这是使用parse_number
中的readr
的解决方案。
df %>%
mutate(grp = cut(parse_number(Trial),
breaks = seq(1, 38, by = 3),
right = FALSE)) %>%
group_by(grp)
这会从Trial
到cut
s中提取数字以创建分组变量,然后将其分组。 right=FALSE
表示间隔在左侧关闭。
基于以下评论的修改。
df %>%
mutate(grp = cut(parse_number(Trial),
breaks = c(seq(1, 34, by = 3) 38),
right = FALSE),
include.lowest = TRUE) %>%
group_by(grp)
答案 2 :(得分:1)
给出
(df <- data.frame(Trial = paste0("ES", 1:10)))
# Trial
# 1 ES1
# 2 ES2
# 3 ES3
# 4 ES4
# 5 ES5
# 6 ES6
# 7 ES7
# 8 ES8
# 9 ES9
# 10 ES10
我们可以使用基数R做
size <- 3
groups <- (as.numeric(substring(df$Trial, 3)) - 1) %/% size
(df$newCol <- sprintf("ES%d-%d", 1 + groups * size, size * (1 + groups)))
# [1] "ES1-3" "ES1-3" "ES1-3" "ES4-6" "ES4-6" "ES4-6" "ES7-9" "ES7-9"
# [9] "ES7-9" "ES10-12"
此处as.numeric(substring(df$Trial, 3))
获取df$Trial
的数字部分并将其转换为数字向量。减去1并使用%/%
,然后返回df$Trial
中每个元素的组号,从0开始。给定一个组号,我们可以轻松地用sprintf
构造一个新列。
size
是组的大小。例如,设置size <- 5
会得到值ES1-5
,ES6-10
,依此类推。