如果模式匹配,我想用两个值中的一个填充新列。
这是我的数据框:
df <- structure(list(loc_01 = c("apis", "indu", "isro", "miss", "non_apis",
"non_indu", "non_isro", "non_miss", "non_piro", "non_sacn", "non_slbe",
"non_voya", "piro", "sacn", "slbe", "voya"), loc01_land = c(165730500,
62101800, 540687600, 161140500, 1694590200, 1459707300, 1025051400,
1419866100, 2037064500, 2204629200, 1918840500, 886299300, 264726000,
321003900, 241292700, 530532000)), class = "data.frame", row.names = c(NA,
-16L), .Names = c("loc_01", "loc01_land"))
看起来像这样......
loc_01 loc01_land
1 apis 165730500
2 indu 62101800
3 isro 540687600
4 miss 161140500
5 non_apis 1694590200
6 non_indu 1459707300
7 non_isro 1025051400
8 non_miss 1419866100
9 non_piro 2037064500
10 non_sacn 2204629200
11 non_slbe 1918840500
12 non_voya 886299300
13 piro 264726000
14 sacn 321003900
15 slbe 241292700
16 voya 530532000
我想在df
添加一个列,名为&#39; loc_01&#39;。如果loc_01包含非,则返回&#39; outside&#39;,如果它不包含 non ,则返回&#39; inside&#39;。这是我的ifelse声明,但我遗漏了一些东西,因为它只返回false
值。
df$loc01 <- ifelse(df$loc_01=="non",'outside','inside')
由此产生的df ......
loc_01 loc01_land loc01
1 apis 165730500 inside
2 indu 62101800 inside
3 isro 540687600 inside
4 miss 161140500 inside
5 non_apis 1694590200 inside
6 non_indu 1459707300 inside
7 non_isro 1025051400 inside
8 non_miss 1419866100 inside
9 non_piro 2037064500 inside
10 non_sacn 2204629200 inside
11 non_slbe 1918840500 inside
12 non_voya 886299300 inside
13 piro 264726000 inside
14 sacn 321003900 inside
15 slbe 241292700 inside
16 voya 530532000 inside
由于 -al
答案 0 :(得分:21)
要检查字符串是否包含某个子字符串,您不能使用==
,因为它执行完全匹配(即仅当字符串正好为&#时才返回true 34;非&#34)
。
您可以使用grepl
函数(属于grep family of functions)执行模式匹配:
df$loc01 <- ifelse(grepl("non",df$loc_01),'outside','inside')
结果:
> df
loc_01 loc01_land loc01
1 apis 165730500 inside
2 indu 62101800 inside
3 isro 540687600 inside
4 miss 161140500 inside
5 non_apis 1694590200 outside
6 non_indu 1459707300 outside
7 non_isro 1025051400 outside
8 non_miss 1419866100 outside
9 non_piro 2037064500 outside
10 non_sacn 2204629200 outside
11 non_slbe 1918840500 outside
12 non_voya 886299300 outside
13 piro 264726000 inside
14 sacn 321003900 inside
15 slbe 241292700 inside
16 voya 530532000 inside
答案 1 :(得分:0)
您只需要一行的代码:
library(dplyr)
library(stringr)
df %>%
mutate(loc01 = if_else(str_starts(loc_01, "non_"), "outside", "inside"))
要使用更复杂的正则表达式,可以使用str_detect
代替str_starts
:
df %>%
mutate(loc01 = if_else(str_detect(loc_01, "^(non_)"), "outside", "inside"))
输出:
loc_01 loc01_land loc01
<chr> <dbl> <chr>
1 apis 165730500 inside
2 indu 62101800 inside
3 isro 540687600 inside
4 miss 161140500 inside
5 non_apis 1694590200 outside
6 non_indu 1459707300 outside
7 non_isro 1025051400 outside
8 non_miss 1419866100 outside
9 non_piro 2037064500 outside