我想创建一个新列,该列合并了两列中的信息,但是一列在另一行中。以下是我要开始使用的示例数据框:
df <- data_frame(person = c(rep("Joe",4),rep("Bob",3)),
meal = c(seq(1:4),seq(1:3)),
food = c("Chicken", "Beef", "Soup and meal 2", "Lamb",
"Lamb","Salad and meal 1","Beef"),
dependencies = c(NA,NA,2,3,NA,1,NA),
solo_meal = c(1,1,0,1,1,0,1))
我想创建一个新列,如下所示:
data_frame(combined_meal = c("Chicken", "Beef", "Soup and Beef", "Lamb",
"Lamb","Salad and Lamb","Beef"))
如果使用依赖项,我想将“食物”与“餐食”结合起来。
我有一个大型数据集,需要将多个依赖项组合到一个字段中。我觉得应该有一种简单的方法来做到这一点,但我似乎想不出一个办法。
谢谢!
修改: 我要感谢到目前为止发表评论的人。 tidyverse选项最适合我的需求。我要添加一个编辑内容-搜索餐点时-我可能需要一起添加多个餐点。
df <- data_frame(person = c(rep("Joe",4),rep("Bob",3)),
meal = c(seq(1:4),seq(1:3)),
food = c("Chicken", "Beef", "Soup and meal 2", "Lamb and meal 3",
"Lamb","Salad and meal 1","Beef"),
dependencies = c(NA,NA,2,3,NA,1,NA),
solo_meal = c(1,1,0,1,1,0,1))
给出:
# A tibble: 7 x 5
person meal food dependencies solo_meal
<chr> <int> <chr> <dbl> <dbl>
1 Joe 1 Chicken NA 1
2 Joe 2 Beef NA 1
3 Joe 3 Soup and meal 2 2 0
4 Joe 4 Lamb and meal 3 3 1
5 Bob 1 Lamb NA 1
6 Bob 2 Salad and meal 1 1 0
7 Bob 3 Beef NA 1
我想要一列三餐:
# A tibble: 7 x 1
combined_meal
<chr>
1 Chicken
2 Beef
3 Soup and Beef
4 Lamb and Soup and Beef
5 Lamb
6 Salad and Lamb
7 Beef
如何递归添加餐点?最好使用tidyverse。
再次感谢!
答案 0 :(得分:1)
这是一个基本解决方案。 (我发现基本的解决方案更容易理解。)您可以创建要修改的行的索引向量,然后从要修改的项以及紧邻它们的项中构建新值(在您的示例中,这些似乎是已分配的任务。 / p>
idx <- which(grepl("meal", df$food))
df[ idx, "combined_meal"] <-
paste( sub("meal.*$", "", df$food[idx] ), df$food [idx-1] )
# The fill in NA's with the original `food` values
df$combined_meal[ is.na(df$combined_meal)] <-
df$food[ is.na(df$combined_meal)]
> df
# A tibble: 7 x 6
person meal food dependencies solo_meal combined_meal
<chr> <int> <chr> <dbl> <dbl> <chr>
1 Joe 1 Chicken NA 1 Chicken
2 Joe 2 Beef NA 1 Beef
3 Joe 3 Soup and meal 2 2 0 Soup and Beef
4 Joe 4 Lamb NA 1 Lamb
5 Bob 1 Lamb NA 1 Lamb
6 Bob 2 Salad and meal 1 1 0 Salad and Lamb
7 Bob 3 Beef NA 1 Beef
>
答案 1 :(得分:0)
使用tidyverse
的解决方案。想法是基于df
,person
和dependencies
,然后通过一些进一步的操作,自我联接mean
表。
library(tidyverse)
df2 <- df %>%
left_join(df %>% select(-dependencies, -solo_meal),
by = c("person", "dependencies" = "meal")) %>%
mutate(food.z = str_replace(food.x, "meal [0-9]", "")) %>%
mutate(combined_meal = ifelse(is.na(food.y), food.z, str_c(food.z, food.y, sep = ""))) %>%
rename(food = food.x) %>%
select(names(df), combined_meal)
df2
# # A tibble: 7 x 6
# person meal food dependencies solo_meal combined_meal
# <chr> <int> <chr> <dbl> <dbl> <chr>
# 1 Joe 1 Chicken NA 1 Chicken
# 2 Joe 2 Beef NA 1 Beef
# 3 Joe 3 Soup and meal 2 2 0 Soup and Beef
# 4 Joe 4 Lamb NA 1 Lamb
# 5 Bob 1 Lamb NA 1 Lamb
# 6 Bob 2 Salad and meal 1 1 0 Salad and Lamb
# 7 Bob 3 Beef NA 1 Beef
答案 2 :(得分:0)
单行解决方案(使用dplyr
):
df %>% group_by(person) %>%
mutate(combined_meal=ifelse(!is.na(dependencies), paste0(gsub("(.* and ).*","\\1",food), food[dependencies]),food))
对于每个person
,我们创建一列combined_meal
,如果没有dependencies
,它将重复food
中的所有内容,如果存在,则{{ 1}}将单词“ and”之前的所有内容与food列中的所有内容以及相关性的行号结合在一起。
(请注意,如果我们仅获取该人的数据帧,则假定“依赖项”中的数字与数据帧的行号相同。这也意味着该数据帧按paste
排序。如果该假设不正确,则可以在meal
之后加入arrange(meal)
行。)
结果:
group_by