R,Dplyr,按组和行/列规范组合信息

时间:2018-11-13 00:49:55

标签: r dplyr

我想创建一个新列,该列合并了两列中的信息,但是一列在另一行中。以下是我要开始使用的示例数据框:

df <- data_frame(person = c(rep("Joe",4),rep("Bob",3)),
               meal = c(seq(1:4),seq(1:3)),
               food = c("Chicken", "Beef", "Soup and meal 2", "Lamb",
                        "Lamb","Salad and meal 1","Beef"),
               dependencies = c(NA,NA,2,3,NA,1,NA),
               solo_meal = c(1,1,0,1,1,0,1))

我想创建一个新列,如下所示:

data_frame(combined_meal = c("Chicken", "Beef", "Soup and Beef", "Lamb",
                              "Lamb","Salad and Lamb","Beef"))

如果使用依赖项,我想将“食物”与“餐食”结合起来。

我有一个大型数据集,需要将多个依赖项组合到一个字段中。我觉得应该有一种简单的方法来做到这一点,但我似乎想不出一个办法。

谢谢!

修改: 我要感谢到目前为止发表评论的人。 tidyverse选项最适合我的需求。我要添加一个编辑内容-搜索餐点时-我可能需要一起添加多个餐点。

df <- data_frame(person = c(rep("Joe",4),rep("Bob",3)),
               meal = c(seq(1:4),seq(1:3)),
               food = c("Chicken", "Beef", "Soup and meal 2", "Lamb and meal 3",
                        "Lamb","Salad and meal 1","Beef"),
               dependencies = c(NA,NA,2,3,NA,1,NA),
               solo_meal = c(1,1,0,1,1,0,1))

给出:

# A tibble: 7 x 5


  person  meal food             dependencies solo_meal
  <chr>  <int> <chr>                   <dbl>     <dbl>
1 Joe        1 Chicken                    NA         1
2 Joe        2 Beef                       NA         1
3 Joe        3 Soup and meal 2             2         0
4 Joe        4 Lamb and meal 3             3         1
5 Bob        1 Lamb                       NA         1
6 Bob        2 Salad and meal 1            1         0
7 Bob        3 Beef                       NA         1

我想要一列三餐:

# A tibble: 7 x 1
  combined_meal         
  <chr>                 
1 Chicken               
2 Beef                  
3 Soup and Beef         
4 Lamb and Soup and Beef
5 Lamb                  
6 Salad and Lamb        
7 Beef  

如何递归添加餐点?最好使用tidyverse。

再次感谢!

3 个答案:

答案 0 :(得分:1)

这是一个基本解决方案。 (我发现基本的解决方案更容易理解。)您可以创建要修改的行的索引向量,然后从要修改的项以及紧邻它们的项中构建新值(在您的示例中,这些似乎是已分配的任务。 / p>

 idx <- which(grepl("meal", df$food))
 df[ idx, "combined_meal"] <- 
             paste( sub("meal.*$", "", df$food[idx] ), df$food [idx-1] )

 # The fill in NA's with the original `food` values
 df$combined_meal[ is.na(df$combined_meal)] <-
          df$food[ is.na(df$combined_meal)]



> df
# A tibble: 7 x 6
  person  meal food             dependencies solo_meal combined_meal  
  <chr>  <int> <chr>                   <dbl>     <dbl> <chr>          
1 Joe        1 Chicken                    NA         1 Chicken        
2 Joe        2 Beef                       NA         1 Beef           
3 Joe        3 Soup and meal 2             2         0 Soup and  Beef 
4 Joe        4 Lamb                       NA         1 Lamb           
5 Bob        1 Lamb                       NA         1 Lamb           
6 Bob        2 Salad and meal 1            1         0 Salad and  Lamb
7 Bob        3 Beef                       NA         1 Beef           
> 

答案 1 :(得分:0)

使用tidyverse的解决方案。想法是基于dfpersondependencies,然后通过一些进一步的操作,自我联接mean表。

library(tidyverse)

df2 <- df %>%
  left_join(df %>% select(-dependencies, -solo_meal), 
            by = c("person", "dependencies" = "meal")) %>%
  mutate(food.z = str_replace(food.x, "meal [0-9]", "")) %>%
  mutate(combined_meal = ifelse(is.na(food.y), food.z, str_c(food.z, food.y, sep = ""))) %>%
  rename(food = food.x) %>%
  select(names(df), combined_meal)
df2
# # A tibble: 7 x 6
#   person  meal food             dependencies solo_meal combined_meal 
#   <chr>  <int> <chr>                   <dbl>     <dbl> <chr>         
# 1 Joe        1 Chicken                    NA         1 Chicken       
# 2 Joe        2 Beef                       NA         1 Beef          
# 3 Joe        3 Soup and meal 2             2         0 Soup and Beef 
# 4 Joe        4 Lamb                       NA         1 Lamb          
# 5 Bob        1 Lamb                       NA         1 Lamb          
# 6 Bob        2 Salad and meal 1            1         0 Salad and Lamb
# 7 Bob        3 Beef                       NA         1 Beef  

答案 2 :(得分:0)

单行解决方案(使用dplyr):

df %>% group_by(person) %>% 
mutate(combined_meal=ifelse(!is.na(dependencies), paste0(gsub("(.* and ).*","\\1",food), food[dependencies]),food))

对于每个person,我们创建一列combined_meal,如果没有dependencies,它将重复food中的所有内容,如果存在,则{{ 1}}将单词“ and”之前的所有内容与food列中的所有内容以及相关性的行号结合在一起。

(请注意,如果我们仅获取该人的数据帧,则假定“依赖项”中的数字与数据帧的行号相同。这也意味着该数据帧按paste排序。如果该假设不正确,则可以在meal之后加入arrange(meal)行。)

结果:

group_by