我想通过排除某些行的条件来分隔列。这是此问题的一个小变化:Applying tidyr separate only to specific rows但是,我不想指定要分隔的行,而是指定要从分隔中排除哪些行。
例如,假设我们要分割“text”列的所有行,除了,其中包含here_do
的那些:
#creating DF for the example
df <- data.frame(var_a = letters[1:5],
var_b = c(sample(1:100, 5)),
text = c("foo_bla",
"here_do",
"oh_yes",
"ba_a",
"lan_d"))
我想在相关问题中会有一些使用extract
的方法,但我似乎无法弄清楚如何修改"(here)_(do)"
部分以使其正常工作:< / p>
library(tidyr)
extract(df, text, into = c("first", "sec"), "(here)_(do)", remove = FALSE)
答案 0 :(得分:3)
如果您不介意使用&#34; data.table&#34;相反,你可以尝试:
library(data.table)
setDT(df)[!text %in% "here_do", c("first", "second") := tstrsplit(text, "_")][]
# var_a var_b text first second
# 1: a 40 foo_bla foo bla
# 2: b 4 here_do NA NA
# 3: c 12 oh_yes oh yes
# 4: d 35 ba_a ba a
# 5: e 11 lan_d lan d
答案 1 :(得分:1)
一种方法是separate
一切然后“拆开”你想要排除的行。
library('tidyverse')
df <- data.frame(var_a = letters[1:5],
var_b = c(sample(1:100, 5)),
text = c("foo_bla",
"here_do",
"oh_yes",
"ba_a",
"lan_d"),
stringsAsFactors = F)
df %>%
separate(text, c('first_val', 'second_val'), remove = F) %>%
mutate(
first_val = ifelse(text == 'here_do', text, first_val),
second_val = ifelse(text == 'here_do', NA, first_val))
#> var_a var_b text first_val second_val
#> 1 a 45 foo_bla foo foo
#> 2 b 43 here_do here_do <NA>
#> 3 c 81 oh_yes oh oh
#> 4 d 33 ba_a ba ba
#> 5 e 15 lan_d lan lan
答案 2 :(得分:1)
我们可以过滤掉您不想分开的行,将其余行分开,然后将结果连接回原始数据框。
library(dplyr)
library(tidyr)
df2 <- df %>%
filter(!(text %in% "here_do")) %>%
separate(text, into = c("First", "Second"), remove = FALSE) %>%
right_join(df, by = c("var_a", "var_b", "text"))
df2
# var_a var_b text First Second
# 1 a 19 foo_bla foo bla
# 2 b 90 here_do <NA> <NA>
# 3 c 21 oh_yes oh yes
# 4 d 6 ba_a ba a
# 5 e 15 lan_d lan d
数据强>
set.seed(244)
df <- data.frame(var_a = letters[1:5],
var_b = c(sample(1:100, 5)),
text = c("foo_bla",
"here_do",
"oh_yes",
"ba_a",
"lan_d"))