我将尝试简化我的df:
Animal1 Animal2 Animal3
dog cat mouse
dog 0 mouse
0 cat 0
只有3条记录。
我希望将所有3种动物合并为一个字段,如下所示:
Animals
dog + cat + mouse
dog + mouse
cat
我认为粘贴或对其进行某种变形是最好的,但我找不到确切的解决方案-我确信这很容易。也许用NA代替0是一个很好的第一步?
请注意,大约需要进行1000万行。
答案 0 :(得分:1)
您可以使用嵌套的sub
函数来获得所需的结果:
df <- data.frame(Animal1 = c("dog", "dog", "0"),
Animal2 = c("cat", "0", "cat"),
Animal3 = c("mouse", "mouse", "0"))
df$Animals <- sub("\\+ 0", "", sub("0 \\+", "", paste(df$Animal1, df$Animal2, df$Animal3, sep = " + ")))
答案 1 :(得分:1)
1)使用结尾处的注释中可重复显示的DF
定义一个Collapse
函数,该函数采用字符向量,删除“ 0”元素并将其余部分折叠成带加号的字符串。使用apply
将其应用于每一行。
Collapse = function(x) paste(x[x != 0], collapse = "+")
transform(DF, Animals = apply(DF, 1, Collapse))
给予:
Animal1 Animal2 Animal3 Animals
1 dog cat mouse dog+cat+mouse
2 dog 0 mouse dog+mouse
3 0 cat 0 cat
2)或者,如果逗号后面可以有空格作为分隔符,则可以将其用于Collapse
:
Collapse <- function(x) toString(x[x != 0])
与(1)中的transform
语句一起使用时,会得出:
Animal1 Animal2 Animal3 Animals
1 dog cat mouse dog, cat, mouse
2 dog 0 mouse dog, mouse
3 0 cat 0 cat
3)另一种可能性是使Animals
列成为向量列表:
DF2 <- DF
DF2$Animals <- lapply(split(DF, 1:nrow(DF)), function(x) x[x != 0])
给予:
> DF2
Animal1 Animal2 Animal3 Animals
1 dog cat mouse dog, cat, mouse
2 dog 0 mouse dog, mouse
3 0 cat 0 cat
> str(DF2)
'data.frame': 3 obs. of 4 variables:
$ Animal1: chr "dog" "dog" "0"
$ Animal2: chr "cat" "0" "cat"
$ Animal3: chr "mouse" "mouse" "0"
$ Animals:List of 3
..$ 1: chr "dog" "cat" "mouse"
..$ 2: chr "dog" "mouse"
..$ 3: chr "cat"
Lines <- "Animal1 Animal2 Animal3
dog cat mouse
dog 0 mouse
0 cat 0"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)
答案 2 :(得分:0)
另一个想法:
library(tidyverse)
df2 %>%
na_if(0) %>%
mutate(Animals = pmap_chr(., .f = ~stringi::stri_flatten(
c(...), collapse = " + ",
na_empty = TRUE, omit_empty = TRUE)))
哪个给:
# Animal1 Animal2 Animal3 Animals
#1 <NA> <NA> mouse mouse
#2 dog cat mouse dog + cat + mouse
#3 dog <NA> mouse dog + mouse
#4 <NA> cat <NA> cat
#5 <NA> <NA> <NA>
数据
df2 <- data.frame(
Animal1 = c("0", "dog", "dog", "0", "0"),
Animal2 = c("0", "cat", "0", "cat","0"),
Animal3 = c("mouse", "mouse", "mouse", "0","0"),
stringsAsFactors = FALSE)