逐行提取两个字符串之间不同的字符

时间:2017-10-12 12:23:22

标签: r dplyr

我在数据框中有两列字符串,对于每一行,我想看到不同的字符。

例如

Lines <- "
a     b
cat   car
dog   ding
cow   haw"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)

返回

a     b     diff
cat   car   t
dog   ding  o
cow   haw   co

我见过

  

Extract characters that differ between two strings

以及

  

Split comma-separated column into separate rows

返回一些简洁的解决方案,这些解决方案适用于单个行(第一个引用),或者行方式但不完全符合我的要求(第二个引用)。

理想情况下,我想使用这样的东西:

Reduce(setdiff, strsplit(c(a, b), split = ""))

我试过了:

apply(df, function(a,b) Reduce(setdiff, strsplit(c(a, b), split = "")))

但无济于事。

如何做到这一点?

P.S。如果可能的话,我特别热衷于使用dplyr,但仅出于风格原因

4 个答案:

答案 0 :(得分:2)

假设最后在Note中重复显示df定义了一个函数Diff,它接受​​两个字符串的vecdors,在它们上运行setdiff并将结果粘贴在一起,然后使用mapply在将它们分成单个字符后在两列上运行它。

Diff <- function(x, y) paste(setdiff(x, y), collapse = "")
transform(df, diff = mapply(Diff, strsplit(a, ""), strsplit(b, "")))

,并提供:

    a    b diff
1 cat  car    t
2 dog ding    o
3 cow  haw   co

注意:上面使用的输入df是:

Lines <- "
a     b
cat   car
dog   ding
cow   haw"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)

答案 1 :(得分:1)

来自tidyversestringr的解决方案。

library(tidyverse)
library(stringr)

dt2 <- dt %>%
  mutate(a_list = str_split(a, pattern = ""), b_list = str_split(b, pattern = "")) %>%
  mutate(diff = map2(a_list, b_list, setdiff)) %>%
  mutate(diff = map_chr(diff, ~paste(., collapse = ""))) %>%
  select_if(~!is.list(.))
dt2
# A tibble: 3 x 3
      a     b  diff
  <chr> <chr> <chr>
1   cat   car     t
2   dog  ding     o
3   cow   haw    co

数据

dt <- read.table(text = "a     b
cat   car
                 dog   ding
                 cow   haw",
                 header = TRUE, stringsAsFactors = FALSE)

答案 2 :(得分:1)

使用dplyr

library(dplyr)
ff = data.frame(a = c("dog","chair","love"),b = c("dot","liar","over"),stringsAsFactors = F)
st = ff %>% mutate(diff = sapply(Map(setdiff,strsplit(a,""),strsplit(b,"")),paste,collapse = ""))

> st
      a    b diff
1   dog  dot    g
2 chair liar   ch
3  love over    l

答案 3 :(得分:0)

这是另一个使用Map的基本R方法。

diffList <- Map(setdiff, strsplit(dat[[1]], ""), strsplit(dat[[2]], ""))
diffList
[[1]]
[1] "t"

[[2]]
[1] "o"

[[3]]
[1] "c" "o"

您可以将其包装在sapply中以返回data.frame的字符向量:

dat$charDiffs <-sapply(diffList, paste, collapse="")

返回

dat
    a    b charDiffs
1 cat  car         t
2 dog ding         o
3 cow  haw        co

数据(来自dput

dat <- 
structure(list(a = c("cat", "dog", "cow"), b = c("car", "ding", 
"haw")), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")