在两个数据帧中查找公共字符串

时间:2018-03-16 20:41:27

标签: r data.table

我已阅读此How to find common rows between two dataframe in R?

我有两个数据

df1 <- structure(list(V1 = structure(c(1L, 3L, 2L, 4L), .Label = c("AMH5", 
"BBHD", "DHE3", "NF1"), class = "factor")), .Names = "V1", class = c("data.table", 
"data.frame"), row.names = c(NA, -4L), .internal.selfref = <pointer: 0x103007b78>)

df2<- structure(list(V1 = structure(c(4L, 2L, 3L, 1L), .Label = c("AMH5 ", 
"BBDQ ", "DHE3", "TBB5 "), class = "factor")), .Names = "V1", class = c("data.table", 
"data.frame"), row.names = c(NA, -4L), .internal.selfref = <pointer: 0x103007b78>)

不幸的是,当我有几个相似的字符串而没有全部检测到时,我找不到问题所在。例如,当我这样做时

library(data.table)
fintersect(setDT(df1), setDT(df2))

只显示一个

V1
1: DHE3

1 个答案:

答案 0 :(得分:0)

在您的数据中,需要进行一些清洁。

# convert to character (if needed)
df1 <- df1[, lapply(.SD, as.character)]
df2 <- df2[, lapply(.SD, as.character)]

# trim whitespace
library(stringr)
df1 <- df1[, lapply(.SD, str_trim)]
df2 <- df2[, lapply(.SD, str_trim)]

# get output
fintersect(df1, df2)

     V1
1: DHE3
2: AMH5