r中的ifelse模式匹配

时间:2014-08-18 21:10:28

标签: regex r if-statement

如果模式匹配,我想用两个值中的一个填充新列。

这是我的数据框:

df <- structure(list(loc_01 = c("apis", "indu", "isro", "miss", "non_apis", 
"non_indu", "non_isro", "non_miss", "non_piro", "non_sacn", "non_slbe", 
"non_voya", "piro", "sacn", "slbe", "voya"), loc01_land = c(165730500, 
62101800, 540687600, 161140500, 1694590200, 1459707300, 1025051400, 
1419866100, 2037064500, 2204629200, 1918840500, 886299300, 264726000, 
321003900, 241292700, 530532000)), class = "data.frame", row.names = c(NA, 
-16L), .Names = c("loc_01", "loc01_land"))

看起来像这样......

     loc_01 loc01_land
1      apis  165730500
2      indu   62101800
3      isro  540687600
4      miss  161140500
5  non_apis 1694590200
6  non_indu 1459707300
7  non_isro 1025051400
8  non_miss 1419866100
9  non_piro 2037064500
10 non_sacn 2204629200
11 non_slbe 1918840500
12 non_voya  886299300
13     piro  264726000
14     sacn  321003900
15     slbe  241292700
16     voya  530532000

我想在df添加一个列,名为&#39; loc_01&#39;。如果loc_01包含,则返回&#39; outside&#39;,如果它不包含 non ,则返回&#39; inside&#39;。这是我的ifelse声明,但我遗漏了一些东西,因为它只返回false值。

df$loc01 <- ifelse(df$loc_01=="non",'outside','inside')

由此产生的df ......

     loc_01 loc01_land  loc01
1      apis  165730500 inside
2      indu   62101800 inside
3      isro  540687600 inside
4      miss  161140500 inside
5  non_apis 1694590200 inside
6  non_indu 1459707300 inside
7  non_isro 1025051400 inside
8  non_miss 1419866100 inside
9  non_piro 2037064500 inside
10 non_sacn 2204629200 inside
11 non_slbe 1918840500 inside
12 non_voya  886299300 inside
13     piro  264726000 inside
14     sacn  321003900 inside
15     slbe  241292700 inside
16     voya  530532000 inside

由于 -al

2 个答案:

答案 0 :(得分:21)

要检查字符串是否包含某个子字符串,您不能使用==,因为它执行完全匹配(即仅当字符串正好为&#时才返回true 34;非&#34)
。 您可以使用grepl函数(属于grep family of functions)执行模式匹配

df$loc01 <- ifelse(grepl("non",df$loc_01),'outside','inside')

结果:

> df
     loc_01 loc01_land   loc01
1      apis  165730500  inside
2      indu   62101800  inside
3      isro  540687600  inside
4      miss  161140500  inside
5  non_apis 1694590200 outside
6  non_indu 1459707300 outside
7  non_isro 1025051400 outside
8  non_miss 1419866100 outside
9  non_piro 2037064500 outside
10 non_sacn 2204629200 outside
11 non_slbe 1918840500 outside
12 non_voya  886299300 outside
13     piro  264726000  inside
14     sacn  321003900  inside
15     slbe  241292700  inside
16     voya  530532000  inside

答案 1 :(得分:0)

您只需要一行的代码:

library(dplyr)
library(stringr)


df %>% 
  mutate(loc01 = if_else(str_starts(loc_01, "non_"), "outside", "inside"))

要使用更复杂的正则表达式,可以使用str_detect代替str_starts

df %>% 
  mutate(loc01 = if_else(str_detect(loc_01, "^(non_)"), "outside", "inside"))

输出:

   loc_01   loc01_land loc01  
   <chr>         <dbl> <chr>  
 1 apis      165730500 inside 
 2 indu       62101800 inside 
 3 isro      540687600 inside 
 4 miss      161140500 inside 
 5 non_apis 1694590200 outside
 6 non_indu 1459707300 outside
 7 non_isro 1025051400 outside
 8 non_miss 1419866100 outside
 9 non_piro 2037064500 outside