从前一个问题跳过,我遇到了使用正确的reg表达式语法来隔离特定单词的问题。
给定数据框:
DL<-c("Dark_ark","Light-Lis","dark7","DK_dark","The_light","Lights","Lig_dark","D_Light")
Col1<-c(1,12,3,6,4,8,2,8)
DF<-data.frame(Col1)
row.names(DF)<-DL
我正在寻找从行名称中提取所有“黑暗”和“光”(忽略大写与小写)并创建仅包含字符串“Dark”或“Light”的第二列
Col2<-c("Dark","Light","dark","dark","light","Light","dark","Light")
DF$Col2<-Col2
Col1 Col2
Dark_ark 1 Dark
Light-Lis 12 Light
dark7 3 dark
DK_dark 6 dark
The_light 4 light
Lights 8 Light
Lig_dark 2 dark
D_Light 8 Light
我稍微改变了原始数据以详细说明我当前的问题,但是Tyler Rinker的一个很好的答案,我使用了这个:
DF$Col2<-gsub("[^dark|light]", "", row.names(DF), ignore.case = TRUE)
但是gsub在一些共同的字母上被绊倒了。搜索留言板以使用正则表达式隔离一个确切的单词,看起来答案应该是使用双斜杠
\\<light\\>
或
\\blight\\b
为什么这行
DF$Col2<-gsub("[^\\<dark\\>|\\<light\\>]", "", row.names(DF), ignore.case = TRUE)
不拉上面所需的列?相反,我得到
Col1 Col2
Dark_ark 1 Darkark
Light-Lis 12 LightLi
dark7 3 dark
DK_dark 6 DKdark
The_light 4 Thlight
Lights 8 Light
Lig_dark 2 Ligdark
D_Light 8 DLight
答案 0 :(得分:9)
这个怎么样?
unlist(regmatches(rownames(DF), gregexpr("dark|light", rownames(DF), ignore.case=TRUE)))
# [1] "Dark" "Light" "dark" "dark" "light" "Light" "dark" "Light"
或
gsub(".*(dark|light).*$", "\\1", row.names(DF), ignore.case = TRUE)
# [1] "Dark" "Light" "dark" "dark" "light" "Light" "dark" "Light"
答案 1 :(得分:5)
一种选择是使用stringr
包:
library(stringr)
str_extract(tolower(rownames(DF)),'dark|light')
[1] "dark" "light" "dark" "dark" "light" "light" "dark" "light"
或者更好地使用@Arun建议:
str_extract(rownames(DF), ignore.case('dark|light'))