使用dplyr将字符向量分组为新的组

时间:2018-12-17 22:11:00

标签: r dplyr

我有一个看起来像这样的数据框:

# A tibble: 5 x 5
# Groups:   Trial [1]
GID       Trial pop   `1A-1145442` `1A-1158042`
<chr>     <chr> <chr>        <int>        <int>
GID421213 ES1   ES1-5           12           11
GID419903 ES1   ES1-5           22           12
GID3881   ES1   ES1-5           22           22
GID13646  ES1   ES1-5           12           12
GID418846 ES1   ES1-5           22           11

这是其中的dput

structure(list(GID = c("GID421213", "GID419903", "GID3881", "GID13646", 
"GID418846"), Trial = c("ES1", "ES1", "ES1", "ES1", "ES1"), pop = c("ES1-5", 
"ES1-5", "ES1-5", "ES1-5", "ES1-5"), `1A-1145442` = c(12L, 22L, 
 22L, 12L, 22L), `1A-1158042` = c(11L, 12L, 22L, 12L, 11L)), row.names = 
 c(NA, -5L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), vars = 
 "Trial", drop = TRUE, indices = list(0:4), group_sizes = 5L, 
 biggest_group_size = 5L, labels = structure(list(Trial = "ES1"), row.names 
 = c(NA, -1L), class = "data.frame", vars = "Trial", drop = TRUE))

我希望像以前使用regex操作但现在使用dplyr使用pop列一样,将Trial列重新组合为新列。 “试验”列包含从1到38的ES值:我想使用dplyr包以这种方式将ES1-3,ES3-6,ES7-9分组。我知道我可以从df >%> group_by(df,Trial)开始,但是从那以后我不知道如何操作。

3 个答案:

答案 0 :(得分:1)

library(dplyr)

df %>% 
  mutate(pop2 = case_when(
    Trial == "ES1" | Trial == "ES2" | Trial == "ES3" ~ "ES1-3",
    Trial == "ES4" | Trial == "ES5" | Trial == "ES6" ~ "ES4-6"
  ))

会回来

    # A tibble: 5 x 6
# Groups:   Trial [1]
  GID       Trial pop   `1A-1145442` `1A-1158042` pop2 
  <chr>     <chr> <chr>        <int>        <int> <chr>
1 GID421213 ES1   ES1-5           12           11 ES1-3
2 GID419903 ES1   ES1-5           22           12 ES1-3
3 GID3881   ES1   ES1-5           22           22 ES1-3
4 GID13646  ES1   ES1-5           12           12 ES1-3
5 GID418846 ES1   ES1-5           22           11 ES1-3

答案 1 :(得分:1)

这是使用parse_number中的readr的解决方案。

df %>% 
  mutate(grp = cut(parse_number(Trial), 
                   breaks = seq(1, 38, by = 3), 
                   right = FALSE)) %>% 
  group_by(grp)

这会从Trialcut s中提取数字以创建分组变量,然后将其分组。 right=FALSE表示间隔在左侧关闭。


基于以下评论的修改。

df %>% 
  mutate(grp = cut(parse_number(Trial), 
                   breaks = c(seq(1, 34, by = 3) 38), 
                   right = FALSE),
                   include.lowest = TRUE) %>% 
  group_by(grp)

答案 2 :(得分:1)

给出

(df <- data.frame(Trial = paste0("ES", 1:10)))
#    Trial
# 1    ES1
# 2    ES2
# 3    ES3
# 4    ES4
# 5    ES5
# 6    ES6
# 7    ES7
# 8    ES8
# 9    ES9
# 10  ES10

我们可以使用基数R做

size <- 3
groups <- (as.numeric(substring(df$Trial, 3)) - 1) %/% size
(df$newCol <- sprintf("ES%d-%d", 1 + groups * size, size * (1 + groups)))
#  [1] "ES1-3"   "ES1-3"   "ES1-3"   "ES4-6"   "ES4-6"   "ES4-6"   "ES7-9"   "ES7-9"  
#  [9] "ES7-9"   "ES10-12"

此处as.numeric(substring(df$Trial, 3))获取df$Trial的数字部分并将其转换为数字向量。减去1并使用%/%,然后返回df$Trial中每个元素的组号,从0开始。给定一个组号,我们可以轻松地用sprintf构造一个新列。

size是组的大小。例如,设置size <- 5会得到值ES1-5ES6-10,依此类推。