这可能是我正在做的愚蠢的事情,但是我试图根据某些列的值是否包含值将它们连接在一起,然后创建两个新列,分别称为start.week和end.week。
我的起始周是“星期一,星期二和星期三”。 end.week是“星期四和星期五”。
Name Monday Tuesday Wednesday Thursday Friday
John Red Pink
Francis Blue Gray Black
Bill Green Orange Purple
Bob Yellow Lilac Magenta
我可以使用以下方式将星期四和星期五的两列加在一起:
start.week = c("Monday", "Tuesday", "Wednesday")
end.week = c("Thursday", "Friday")
options(stringsAsFactors = FALSE)
df = mutate(df, end.week = ifelse(Friday != "", paste0(Thursday, " + ", Friday), Thursday))
返回,我不知道如何在start.week上做到这一点
有人可以给我提示吗?我将永远感激不尽。
原始数据:
df = structure(list(Name = c("John", "Francis", "Bill", "Bob"), Monday =
c("Red", "Blue", "Green", "Yellow"), Tuesday = c("", "Gray", "", ""),
Wednesday = c("Pink", "", "Orange", ""), Thursday = c("",
"Black", "Purple", "Lilac"), Friday = c("", "", "", "Magenta"
)), class = "data.frame", row.names = c(NA, -4L))
预期输出:
df = structure(list(Name = c("John", "Francis", "Bill", "Bob"), Monday =
c("Red", "Blue", "Green", "Yellow"), Tuesday = c("", "Gray", "", ""),
Wednesday = c("Pink", "", "Orange", ""), Thursday = c("",
"Black", "Purple", "Lilac"), Friday = c("", "", "", "Magenta"
), start.week = c("Red + Pink", "Black", "Green + Orange",
"Yellow"), end.week = c("", "", "Purple", "Lilac + Magenta"
)), class = "data.frame", row.names = c(NA, -4L))
答案 0 :(得分:1)
这样的事情怎么样?
library(tidyverse)
df %>%
gather(key, val, -Name) %>%
group_by(Name) %>%
mutate(
start.week = paste(val[key %in% start.week & val != ""], collapse = " + "),
end.week = paste(val[key %in% end.week & val != ""], collapse = " + ")) %>%
spread(key, val)
## A tibble: 4 x 8
## Groups: Name [4]
# Name start.week end.week Friday Monday Thursday Tuesday Wednesday
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 Bill Green + Orange Purple "" Green Purple "" Orange
#2 Bob Yellow Lilac + Mage… Magen… Yellow Lilac "" ""
#3 Francis Blue + Gray Black "" Blue Black Gray ""
#4 John Red + Pink "" "" Red "" "" Pink
该想法是将数据从宽转换为长,添加新列start.week
和end.week
,然后将数据转换回宽。
或者我们可以使用purrr::imap_dfc
在某种程度上自动生成新列;为此,我们需要将新列存储在命名为list
中。
lst <- list(start.week = start.week, end.week = end.week)
df %>%
gather(key, val, -Name) %>%
group_by(Name) %>%
mutate(
tmp = list(imap_dfc(lst, ~paste(val[key %in% .x & val != ""], collapse = "+")))) %>%
unnest() %>%
spread(key, val)
请注意,我认为您的预期输出有误; start.week
的{{1}}应该是Francis
,而不是Blue + Gray
。