选择不包含特定模式R regex

时间:2018-09-13 17:44:35

标签: r regex

我有一组文件,我想从中选择不包含“数据集”或“ eff”术语的文件。

数据

k <- c("Duct1/X SN5 F9MH.csv", "Duct1/X SN5 F9MH_dataset.csv", "Duct1/X SN5 F9MH_eff.csv", 
"Duct2/X F7 X300 E10.csv", "Duct2/X F7 X300 E10_dataset.csv", 
"Duct2/X F7 X300 E10_eff.csv", "Duct3/X600 F8 X600 E10.csv", 
"Duct3/X600 F8 X600 E10_dataset.csv", "Duct3/X600 F8 X600 E10_eff.csv", 
"Duct4/X F7 X600 E10.csv", "Duct4/X F7 X600 E10_dataset.csv", 
"Duct4/X F7 X600 E10_eff.csv")

代码

据我了解,我可以使用[^...]从结果中排除某些字符(用...表示)。

尝试N

# Looking for N works 
> grep('.*[N].*', k, value = T)
[1] "Duct1/X SN5 F9MH.csv"         "Duct1/X SN5 F9MH_dataset.csv" "Duct1/X SN5 F9MH_eff.csv"    

# Looking for strings not containing N does not work 
> grep('.*[!N].*', k, value = T)
[1] "Duct1/X SN5 F9MH.csv"         "Duct1/X SN5 F9MH_dataset.csv" "Duct1/X SN5 F9MH_eff.csv"    

# Trying with ^ also does not work 
> grep('.*[^N].*', k, value = T)
 [1] "Duct1/X SN5 F9MH.csv"               "Duct1/X SN5 F9MH_dataset.csv"       "Duct1/X SN5 F9MH_eff.csv"          
 [4] "Duct2/X F7 X300 E10.csv"            "Duct2/X F7 X300 E10_dataset.csv"    "Duct2/X F7 X300 E10_eff.csv"       
 [7] "Duct3/X600 F8 X600 E10.csv"         "Duct3/X600 F8 X600 E10_dataset.csv" "Duct3/X600 F8 X600 E10_eff.csv"    
[10] "Duct4/X F7 X600 E10.csv"            "Duct4/X F7 X600 E10_dataset.csv"    "Duct4/X F7 X600 E10_eff.csv" 

我可以用grepl得到结果,并用它来对字符向量进行子集化:

> k[!grepl(pattern = 'N', x = k)]
[1] "Duct2/X F7 X300 E10.csv"            "Duct2/X F7 X300 E10_dataset.csv"    "Duct2/X F7 X300 E10_eff.csv"       
[4] "Duct3/X600 F8 X600 E10.csv"         "Duct3/X600 F8 X600 E10_dataset.csv" "Duct3/X600 F8 X600 E10_eff.csv"    
[7] "Duct4/X F7 X600 E10.csv"            "Duct4/X F7 X600 E10_dataset.csv"    "Duct4/X F7 X600 E10_eff.csv" 

对于我的实际用例(dataset|eff):

> k[!grepl(pattern = 'eff|dataset', x = k)]
[1] "Duct1/X SN5 F9MH.csv"       "Duct2/X F7 X300 E10.csv"    "Duct3/X600 F8 X600 E10.csv"
[4] "Duct4/X F7 X600 E10.csv"   

但是我正在寻找一种使用grep(... , value = T)的方法,因为我不想存储字符向量(k)-它是另一个函数的输出。

1 个答案:

答案 0 :(得分:1)

grep('N',k,value = T,invert = T)
[1] "Duct2/X F7 X300 E10.csv"           
[2] "Duct2/X F7 X300 E10_dataset.csv"   
[3] "Duct2/X F7 X300 E10_eff.csv"       
[4] "Duct3/X600 F8 X600 E10.csv"        
[5] "Duct3/X600 F8 X600 E10_dataset.csv"
[6] "Duct3/X600 F8 X600 E10_eff.csv"    
[7] "Duct4/X F7 X600 E10.csv"           
[8] "Duct4/X F7 X600 E10_dataset.csv"   
[9] "Duct4/X F7 X600 E10_eff.csv"

因此,您可以这样做:

grep('eff|dataset', k, invert = TRUE, value = TRUE)