匹配R中的两个列表,一个带有部分字符串,另一个带有完整字符串,如果匹配则返回整个字符串。仅返回唯一匹配(一次)。
所以,我们说我有一个CSV文件,每行都有一个长字符串(长列表)。然后,我使用substr缩短字符串,然后使用unique删除任何重复的字符串。然后我想将长字符串列表df12
与唯一的短列表df14
进行比较,如果在部分字符串搜索(df14
vs df12
)上有唯一匹配,那么从df12
返回整个字符串。
这是df12
(长字符串列表)
[1] I like stackoverflow very much today
[2] I like stackoverflow much today
[3] I dont like stackoverflow very much today
[4] I dont like you!
[5] What?
df13<-substr(df12, start=0, stop=30)
这是df13
(缩短的字符串 - 不是唯一的)
[1] I like stacko
[2] I like stacko
[3] I dont like s
[4] I dont like y
[5] What?
df14<-unique(df13)
这是df14
(缩短字符串 - 应用唯一方法后的唯一字符串)
[1] I like stacko
[2] I dont like s
[3] I dont like y
[4] What?
这是我最终想要的结果
[1] I like stackoverflow very much today
[2] I dont like stackoverflow very much today
[3] I dont like you!
[4] What?
答案 0 :(得分:3)
这是将df14中的每个短字符串与df12中所有可能的匹配项匹配并输出它们的一种方法,包括短字符串作为列表的索引,以便知道哪一个匹配df12中的那些:
df1 <- c('I like stackoverflow very much today', 'I like stackoverflow much today',
'I dont like stackoverflow very much today', 'I dont like you!',
'What?')
df2 <- c('I like stacko', 'I dont like s', 'I dont like y', 'What?')
sapply(df2, function(x) df1[grepl(x, df1)])
$`I like stacko`
[1] "I like stackoverflow very much today" "I like stackoverflow much today"
$`I dont like s`
[1] "I dont like stackoverflow very much today"
$`I dont like y`
[1] "I dont like you!"
$`What?`
[1] "What?"