我的数据框有两列specialty
和keywords
。如果在keywords
与search terms
列中的任何值之间找到匹配项,我会使用以下代码从specialty
列中提取值:
speciality <- c("Emergency medicine","Allergology","Anesthesiology","Hematology","Cardiology")
keywords <- c("emergency room OR emergency medicine OR emergency department",
"Allergy OR rhinitis OR asthma OR atopic eczema",
"Pain OR local anaesthesia OR general anaesthesia OR induced sleep",
"Anemia OR bleeding disorders OR hemophilia OR blood cancers",
"Heart OR cardiac diseases OR Cardiomyopathy OR Congenital Heart Disease OR Cardiac Arrhythmia")
sample <- data.frame(speciality, keywords)
keyspecial <- "Allergology"
subkeywords <- subset(sample$keywords, sample$speciality==keyspecial)
View(subkeywords)
所以我在专栏Allergology
中搜索speciality
。一旦我运行代码,我得到
Allergy OR rhinitis OR asthma OR atopic eczema
我面临的问题是,如果我搜索allergology
而不是Allergology
,我就不会得到结果。或者,如果我只想使用emergency
而不是Emergency medicine
进行搜索。
有什么建议吗?
答案 0 :(得分:2)
更改此行:
subkeywords <- subset(sample$keywords, sample$speciality==keyspecial)
对此:
subkeywords <- subset(sample$keywords, grepl(keyspecial, sample$speciality, ignore.case=TRUE))
它的作用是因为函数grepl
,其ignore.case
参数可以设置为TRUE
以忽略大小写。然而,这个寻找不完整的匹配。因此,当您搜索 Allergology 时,它还会找到 The Allergology 等等。
为了只匹配完整的单词,您可以使用以下单词:
subkeywords <- subset(sample$keywords, tolower(sample$speciality)==tolower(keyspecial))
这样,在比较之前,您首先会将两个单词转换为小写形式。
答案 1 :(得分:1)
您可以使用str_detect
并忽略大小写
library(tidyverse)
keyspecial <- "allergology"
sample %>%
filter(str_detect(speciality, fixed(keyspecial, ignore_case = TRUE)))
答案 2 :(得分:0)
您可以尝试这样的字符串修剪:
matchList <- sapply(speciality,function(x) strsplit(tolower(x),split=" ")[[1]])
keyspecial <- "Allergology"
subkeywords <- subset(sample$keywords,sapply(matchList,function(y){any(tolower(keyspecial) %in% y)}))
View(subkeywords)
keyspecial <- "allergology"
subkeywords <- subset(sample$keywords,sapply(matchList,function(y){any(tolower(keyspecial) %in% y)}))
View(subkeywords)