Question

我想创建一个新列，该列合并了两列中的信息，但是一列在另一行中。以下是我要开始使用的示例数据框：

df <- data_frame(person = c(rep("Joe",4),rep("Bob",3)),
               meal = c(seq(1:4),seq(1:3)),
               food = c("Chicken", "Beef", "Soup and meal 2", "Lamb",
                        "Lamb","Salad and meal 1","Beef"),
               dependencies = c(NA,NA,2,3,NA,1,NA),
               solo_meal = c(1,1,0,1,1,0,1))

我想创建一个新列，如下所示：

data_frame(combined_meal = c("Chicken", "Beef", "Soup and Beef", "Lamb",
                              "Lamb","Salad and Lamb","Beef"))

如果使用依赖项，我想将“食物”与“餐食”结合起来。

我有一个大型数据集，需要将多个依赖项组合到一个字段中。我觉得应该有一种简单的方法来做到这一点，但我似乎想不出一个办法。

谢谢！

修改：我要感谢到目前为止发表评论的人。 tidyverse选项最适合我的需求。我要添加一个编辑内容-搜索餐点时-我可能需要一起添加多个餐点。

df <- data_frame(person = c(rep("Joe",4),rep("Bob",3)),
               meal = c(seq(1:4),seq(1:3)),
               food = c("Chicken", "Beef", "Soup and meal 2", "Lamb and meal 3",
                        "Lamb","Salad and meal 1","Beef"),
               dependencies = c(NA,NA,2,3,NA,1,NA),
               solo_meal = c(1,1,0,1,1,0,1))

给出：

# A tibble: 7 x 5


  person  meal food             dependencies solo_meal
  <chr>  <int> <chr>                   <dbl>     <dbl>
1 Joe        1 Chicken                    NA         1
2 Joe        2 Beef                       NA         1
3 Joe        3 Soup and meal 2             2         0
4 Joe        4 Lamb and meal 3             3         1
5 Bob        1 Lamb                       NA         1
6 Bob        2 Salad and meal 1            1         0
7 Bob        3 Beef                       NA         1

我想要一列三餐：

# A tibble: 7 x 1
  combined_meal         
  <chr>                 
1 Chicken               
2 Beef                  
3 Soup and Beef         
4 Lamb and Soup and Beef
5 Lamb                  
6 Salad and Lamb        
7 Beef

如何递归添加餐点？最好使用tidyverse。

再次感谢！

Answer 1

这是一个基本解决方案。（我发现基本的解决方案更容易理解。）您可以创建要修改的行的索引向量，然后从要修改的项以及紧邻它们的项中构建新值（在您的示例中，这些似乎是已分配的任务。 / p>

 idx <- which(grepl("meal", df$food))
 df[ idx, "combined_meal"] <- 
             paste( sub("meal.*$", "", df$food[idx] ), df$food [idx-1] )

 # The fill in NA's with the original `food` values
 df$combined_meal[ is.na(df$combined_meal)] <-
          df$food[ is.na(df$combined_meal)]



> df
# A tibble: 7 x 6
  person  meal food             dependencies solo_meal combined_meal  
  <chr>  <int> <chr>                   <dbl>     <dbl> <chr>          
1 Joe        1 Chicken                    NA         1 Chicken        
2 Joe        2 Beef                       NA         1 Beef           
3 Joe        3 Soup and meal 2             2         0 Soup and  Beef 
4 Joe        4 Lamb                       NA         1 Lamb           
5 Bob        1 Lamb                       NA         1 Lamb           
6 Bob        2 Salad and meal 1            1         0 Salad and  Lamb
7 Bob        3 Beef                       NA         1 Beef           
>

Answer 2

使用tidyverse的解决方案。想法是基于df，person和dependencies，然后通过一些进一步的操作，自我联接mean表。

library(tidyverse)

df2 <- df %>%
  left_join(df %>% select(-dependencies, -solo_meal), 
            by = c("person", "dependencies" = "meal")) %>%
  mutate(food.z = str_replace(food.x, "meal [0-9]", "")) %>%
  mutate(combined_meal = ifelse(is.na(food.y), food.z, str_c(food.z, food.y, sep = ""))) %>%
  rename(food = food.x) %>%
  select(names(df), combined_meal)
df2
# # A tibble: 7 x 6
#   person  meal food             dependencies solo_meal combined_meal 
#   <chr>  <int> <chr>                   <dbl>     <dbl> <chr>         
# 1 Joe        1 Chicken                    NA         1 Chicken       
# 2 Joe        2 Beef                       NA         1 Beef          
# 3 Joe        3 Soup and meal 2             2         0 Soup and Beef 
# 4 Joe        4 Lamb                       NA         1 Lamb          
# 5 Bob        1 Lamb                       NA         1 Lamb          
# 6 Bob        2 Salad and meal 1            1         0 Salad and Lamb
# 7 Bob        3 Beef                       NA         1 Beef

Answer 3

单行解决方案（使用dplyr）：

df %>% group_by(person) %>% 
mutate(combined_meal=ifelse(!is.na(dependencies), paste0(gsub("(.* and ).*","\\1",food), food[dependencies]),food))

对于每个person，我们创建一列combined_meal，如果没有dependencies，它将重复food中的所有内容，如果存在，则{{ 1}}将单词“ and”之前的所有内容与food列中的所有内容以及相关性的行号结合在一起。

（请注意，如果我们仅获取该人的数据帧，则假定“依赖项”中的数字与数据帧的行号相同。这也意味着该数据帧按paste排序。如果该假设不正确，则可以在meal之后加入arrange(meal)行。）

结果：

group_by

R，Dplyr，按组和行/列规范组合信息

3 个答案: