我有一个数据框,如下所示,如何比较2列中的值。即第1行在col a和b中都有公共字符串(“ SZY”),而col a(ABC)中有多余的字符串 对于第5行,通用字符串是“ BNM”,而在col a和b中都包含额外的字符串。
a=c("ABC,SZY","XYZ",NA,NA,"ABC,BNM,JKL","DEF","XCV")
b=c("SZY","XYZ,IOP","QWE",NA,"BNM,JKL,STU","DEF","HJK")
df = data.frame(a,b)
输出应如下
output = c("COMMON+column_a","COMMON+column_b","DIFFERENT",NA,"COMMON+column_a+column_b","COMMON","DIFFERENT")
df = cbind(df,output)
答案 0 :(得分:1)
这里是基数R中的另一个,
vapply(strsplit(do.call(paste, df), " |,"), function(x)
toString(unique(x[x != 'NA'])), character(1L))
#[1] "ABC, SZY" "XYZ, IOP" "QWE" "" "ABC, BNM, JKL, STU" "DEF" "XCV, HJK"
答案 1 :(得分:0)
使用基数R apply
,我们可以在逗号上分割字符串,删除NA
项,仅保留unique
个值,将它们再次转换为逗号分隔的字符串。
df$output <- apply(df, 1, function(x)
toString(unique(na.omit(unlist(strsplit(x, ","))))))
df
# a b output
#1 ABC,SZY SZY ABC, SZY
#2 XYZ XYZ,IOP XYZ, IOP
#3 <NA> QWE QWE
#4 <NA> <NA>
#5 ABC,BNM,JKL BNM,JKL,STU ABC, BNM, JKL, STU
#6 DEF DEF DEF
#7 XCV HJK XCV, HJK
答案 2 :(得分:0)
这是cSplit
的一个选项,在创建行名列之后,我们在定界符,
上将数据集列拆分为'long'格式。然后按“ rn”分组,用union
获取列元素的Reduce
,并在原始数据集中将该列分配为“输出”
library(data.table)
library(splitstackshape)
df$output <- cSplit(setDT(df, keep.rownames = TRUE), c("a", "b"), ",",
"long")[, toString(Reduce(union, lapply(.SD, na.omit))), rn]$V1
df
# rn a b output
#1: 1 ABC,SZY SZY ABC, SZY
#2: 2 XYZ XYZ,IOP XYZ, IOP
#3: 3 <NA> QWE QWE
#4: 4 <NA> <NA>
#5: 5 ABC,BNM,JKL BNM,JKL,STU ABC, BNM, JKL, STU
#6: 6 DEF DEF DEF
#7: 7 XCV HJK XCV, HJK
或者使用tidyverse
,在创建行名列之后,将数据gather
转换为'long'格式,在定界符,
上分隔'val'行,替换NA带有,
的元素,获取基于'rn'和'val'列的distinct
行,将字符串按{rn'分组在一起粘贴(str_c
)并绑定列“ output”原始数据集
library(tidyverse)
rownames_to_column(df, 'rn') %>%
gather(key, val, -rn) %>%
separate_rows(val) %>%
mutate(val = replace_na(val, "")) %>%
distinct(rn, val) %>%
group_by(rn) %>%
summarise(val = str_c(val, collapse=",")) %>%
select(-rn) %>%
bind_cols(df, .)
或者使用base R
,我们在定界符strsplit
中用,
拆分列,使用{{1}获得union
元素中的list
},然后Map
放入一个字符串,paste
unlist
放入一个list
,然后将其分配以创建“输出”列
vector
df$output <- unlist(do.call(Map, c(f = function(...)
toString(union(...)), unname(lapply(df, strsplit, ",")))))