组合字符串列和跳过特定字段的最有效方法

时间:2018-09-06 14:43:51

标签: r string string-concatenation

我将尝试简化我的df:

Animal1  Animal2  Animal3
dog      cat      mouse
dog      0        mouse
0        cat      0

只有3条记录。

我希望将所有3种动物合并为一个字段,如下所示:

Animals
dog + cat + mouse
dog + mouse
cat

我认为粘贴或对其进行某种变形是最好的,但我找不到确切的解决方案-我确信这很容易。也许用NA代替0是一个很好的第一步?

请注意,大约需要进行1000万行。

3 个答案:

答案 0 :(得分:1)

您可以使用嵌套的sub函数来获得所需的结果:

df <- data.frame(Animal1 = c("dog", "dog", "0"), 
                 Animal2 = c("cat", "0", "cat"), 
                 Animal3 = c("mouse", "mouse", "0"))

df$Animals <- sub("\\+ 0", "", sub("0 \\+", "", paste(df$Animal1, df$Animal2, df$Animal3, sep = " + ")))

答案 1 :(得分:1)

1)使用结尾处的注释中可重复显示的DF定义一个Collapse函数,该函数采用字符向量,删除“ 0”元素并将其余部分折叠成带加号的字符串。使用apply将其应用于每一行。

Collapse = function(x) paste(x[x != 0], collapse = "+")
transform(DF, Animals = apply(DF, 1, Collapse))

给予:

  Animal1 Animal2 Animal3       Animals
1     dog     cat   mouse dog+cat+mouse
2     dog       0   mouse     dog+mouse
3       0     cat       0           cat

2)或者,如果逗号后面可以有空格作为分隔符,则可以将其用于Collapse

Collapse <- function(x) toString(x[x != 0])

与(1)中的transform语句一起使用时,会得出:

  Animal1 Animal2 Animal3         Animals
1     dog     cat   mouse dog, cat, mouse
2     dog       0   mouse      dog, mouse
3       0     cat       0             cat

3)另一种可能性是使Animals列成为向量列表:

DF2 <- DF
DF2$Animals <- lapply(split(DF, 1:nrow(DF)), function(x) x[x != 0])

给予:

> DF2
  Animal1 Animal2 Animal3         Animals
1     dog     cat   mouse dog, cat, mouse
2     dog       0   mouse      dog, mouse
3       0     cat       0             cat

> str(DF2)
'data.frame':   3 obs. of  4 variables:
 $ Animal1: chr  "dog" "dog" "0"
 $ Animal2: chr  "cat" "0" "cat"
 $ Animal3: chr  "mouse" "mouse" "0"
 $ Animals:List of 3
  ..$ 1: chr  "dog" "cat" "mouse"
  ..$ 2: chr  "dog" "mouse"
  ..$ 3: chr "cat"

注意

Lines <- "Animal1  Animal2  Animal3
dog      cat      mouse
dog      0        mouse
0        cat      0"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)

答案 2 :(得分:0)

另一个想法:

library(tidyverse)

df2 %>%
  na_if(0) %>%
  mutate(Animals = pmap_chr(., .f = ~stringi::stri_flatten(
    c(...), collapse = " + ", 
    na_empty = TRUE, omit_empty = TRUE)))

哪个给:

#  Animal1 Animal2 Animal3           Animals
#1    <NA>    <NA>   mouse             mouse
#2     dog     cat   mouse dog + cat + mouse
#3     dog    <NA>   mouse       dog + mouse
#4    <NA>     cat    <NA>               cat
#5    <NA>    <NA>    <NA>                  

数据

df2 <- data.frame(
  Animal1 = c("0", "dog", "dog", "0", "0"), 
  Animal2 = c("0", "cat", "0", "cat","0"), 
  Animal3 = c("mouse", "mouse", "mouse", "0","0"),
  stringsAsFactors = FALSE)