通过指定要排除的行,应用tidyr仅分隔特定行

时间:2018-02-01 10:22:36

标签: r tidyr

我想通过排除某些行的条件来分隔列。这是此问题的一个小变化:Applying tidyr separate only to specific rows但是,我不想指定要分隔的行,而是指定要从分隔中排除哪些行。

例如,假设我们要分割“text”列的所有行,除了,其中包含here_do的那些:

#creating DF for the example
df <- data.frame(var_a = letters[1:5],
                var_b = c(sample(1:100, 5)),
                text = c("foo_bla", 
                         "here_do",
                         "oh_yes",
                         "ba_a",
                         "lan_d"))

我想在相关问题中会有一些使用extract的方法,但我似乎无法弄清楚如何修改"(here)_(do)"部分以使其正常工作:< / p>

library(tidyr)
extract(df, text, into = c("first", "sec"), "(here)_(do)", remove = FALSE)

3 个答案:

答案 0 :(得分:3)

如果您不介意使用&#34; data.table&#34;相反,你可以尝试:

library(data.table)
setDT(df)[!text %in% "here_do", c("first", "second") := tstrsplit(text, "_")][]
#    var_a var_b    text first second
# 1:     a    40 foo_bla   foo    bla
# 2:     b     4 here_do    NA     NA
# 3:     c    12  oh_yes    oh    yes
# 4:     d    35    ba_a    ba      a
# 5:     e    11   lan_d   lan      d

答案 1 :(得分:1)

一种方法是separate一切然后“拆开”你想要排除的行。

library('tidyverse')

df <- data.frame(var_a = letters[1:5],
                var_b = c(sample(1:100, 5)),
                text = c("foo_bla", 
                         "here_do",
                         "oh_yes",
                         "ba_a",
                         "lan_d"),
                stringsAsFactors = F)

df %>%
  separate(text, c('first_val', 'second_val'), remove = F) %>%
  mutate(
    first_val = ifelse(text == 'here_do', text, first_val),
    second_val = ifelse(text == 'here_do', NA, first_val))
#>   var_a var_b    text first_val second_val
#> 1     a    45 foo_bla       foo        foo
#> 2     b    43 here_do   here_do       <NA>
#> 3     c    81  oh_yes        oh         oh
#> 4     d    33    ba_a        ba         ba
#> 5     e    15   lan_d       lan        lan

答案 2 :(得分:1)

我们可以过滤掉您不想分开的行,将其余行分开,然后将结果连接回原始数据框。

library(dplyr)
library(tidyr)

df2 <- df %>%
  filter(!(text %in% "here_do")) %>%
  separate(text, into = c("First", "Second"), remove = FALSE) %>%
  right_join(df, by = c("var_a", "var_b", "text"))
df2
#   var_a var_b    text First Second
# 1     a    19 foo_bla   foo    bla
# 2     b    90 here_do  <NA>   <NA>
# 3     c    21  oh_yes    oh    yes
# 4     d     6    ba_a    ba      a
# 5     e    15   lan_d   lan      d

数据

set.seed(244)

df <- data.frame(var_a = letters[1:5],
                 var_b = c(sample(1:100, 5)),
                 text = c("foo_bla", 
                          "here_do",
                          "oh_yes",
                          "ba_a",
                          "lan_d"))