r来自列的子集单元格值基于来自另一列的匹配

时间:2018-03-09 10:55:42

标签: r

我的数据框有两列specialtykeywords。如果在keywordssearch terms列中的任何值之间找到匹配项,我会使用以下代码从specialty列中提取值:

speciality <- c("Emergency medicine","Allergology","Anesthesiology","Hematology","Cardiology")
keywords <- c("emergency room OR emergency medicine OR emergency department", 
          "Allergy OR rhinitis OR asthma OR atopic eczema", 
          "Pain OR local anaesthesia OR general anaesthesia OR induced sleep", 
          "Anemia OR bleeding disorders OR hemophilia OR blood cancers", 
          "Heart OR cardiac diseases OR Cardiomyopathy OR Congenital Heart Disease OR Cardiac Arrhythmia")
sample <- data.frame(speciality, keywords)
keyspecial <- "Allergology"
subkeywords <- subset(sample$keywords, sample$speciality==keyspecial)
View(subkeywords)

所以我在专栏Allergology中搜索speciality。一旦我运行代码,我得到 Allergy OR rhinitis OR asthma OR atopic eczema

我面临的问题是,如果我搜索allergology而不是Allergology,我就不会得到结果。或者,如果我只想使用emergency而不是Emergency medicine进行搜索。

有什么建议吗?

3 个答案:

答案 0 :(得分:2)

更改此行:

subkeywords <- subset(sample$keywords, sample$speciality==keyspecial)

对此:

subkeywords <- subset(sample$keywords, grepl(keyspecial, sample$speciality, ignore.case=TRUE))

它的作用是因为函数grepl,其ignore.case参数可以设置为TRUE以忽略大小写。然而,这个寻找不完整的匹配。因此,当您搜索 Allergology 时,它还会找到 The Allergology 等等。

为了只匹配完整的单词,您可以使用以下单词:

subkeywords <- subset(sample$keywords, tolower(sample$speciality)==tolower(keyspecial))

这样,在比较之前,您首先会将两个单词转换为小写形式。

答案 1 :(得分:1)

您可以使用str_detect并忽略大小写

library(tidyverse)
keyspecial <- "allergology"

sample %>% 
  filter(str_detect(speciality, fixed(keyspecial, ignore_case = TRUE)))

答案 2 :(得分:0)

您可以尝试这样的字符串修剪:

matchList <- sapply(speciality,function(x) strsplit(tolower(x),split=" ")[[1]])
keyspecial <- "Allergology"
subkeywords <- subset(sample$keywords,sapply(matchList,function(y){any(tolower(keyspecial) %in% y)}))
View(subkeywords)
keyspecial <- "allergology"
subkeywords <- subset(sample$keywords,sapply(matchList,function(y){any(tolower(keyspecial) %in% y)}))
View(subkeywords)