Question

我有一个由单个列制作的df，结构如下：

      V  
 I-232 0 CAT
 G-435 1 DOG
 X-212 AIR

我想创建一个像这样的新DF：

    N   V
    0  CAT
    1  DOG

所以，我想只提取那些包含0或1 的行和文本在它们之后，创建一个新的DF（2列），第一个由那些索引（0/1）和另一列上相应的跟随词。

我该怎么办？

Answer 1

为了替代方案，这里是基础R中具有正则表达式的版本：

x <- c("I-232 0 CAT","G-435 1 DOG","X-212 AIR")
my_regex <- "^.* (1|0) (.*)$"
partial <- regmatches(x, regexec(my_regex, x))

df <- as.data.frame( Reduce( rbind, partial[ sapply(partial, length) > 0 ] )[,2:3],"")

，并提供：

> df
  V1  V2
1  0 CAT
2  1 DOG

我们的想法是在一次传递中匹配并组合所需值的组，其中正则表达式"^.* (1|0) (.*)$"匹配字符串的开头，直到＆＃34;空格后跟1或0本身后跟空格＆＃ 34;什么都行。在此过程中，它会在第一组()中捕获备选1或0，并在第二组中占用空格后的剩余文本。

regmatches输出如下：

> regmatches(x,regexec(my_regex,x))
[[1]]
[1] "I-232 0 CAT" "0"           "CAT"        

[[2]]
[1] "G-435 1 DOG" "1"           "DOG"        

[[3]]
character(0)

所以我们过滤此结果以使用partial[ sapply(partial,length) > 0 ]排除空行，然后我们向列表的每个条目询问Reduce rbind并将其转换为{{1}的data.frame （最后as.data.frame参数是为了避免Reduce引起的行名称）然后我们将这个data.frame只为两个所需的列（2和3，我们的组作为regmatches返回匹配的文本作为第一个条目。

Answer 2

V <- c("aaa 0 cat", "bbb 1 dog ", "ccc 2 air")
df <- data.frame(V)

> df
           V
1  aaa 0 cat
2 bbb 1 dog 
3  ccc 2 air

您可以使用dplyr和tidyr软件包

library(dplyr)
library(tidyr)

df2 <- separate(df, V, c("txt", "ind", "txt2"), sep = " ")
df3 <- filter(df2, ind %in% 0:1)
df4 <- select(df3, ind, txt2)

> df4
  ind txt2
1   0  CAT
2   1  DOG

或使用烟斗

df %>% 
  separate(V, c("txt", "ind", "txt2"), sep = " ") %>%
  filter(ind %in% 0:1) %>% 
  select(-txt)

Answer 3

以下是grepl和strsplit

的答案

x <- c("I-232 0 CAT","G-435 1 DOG","X-212 AIR")

# which elements have " 0 " or " 1 "
ind <- grepl("[[:space:]](1|0)[[:space:]]", x)

# split
res <- strsplit(x[ind], "1[[:space:]]|0[[:space:]]")

# take last element
sapply(res, function(x) x[length(x)])

提取数据框中特定行中的数字和字符串

3 个答案: