在数据框中
df <- structure(list(Var1 = structure(1:19, .Label = c("S2107810801_BY20",
"S2107810801_BY20_CT", "S2111660501_BY3", "S2111660501_BY3_CT",
"S2111660501_SE26", "S2111660501_SE27", "S2111660501_SE27_CT",
"S2111660501_SE8", "S2111803201_SE12", "S2111831801_SE24", "S2112650301_SE21",
"S2112650301_SE21_CT", "S2112650301_SE25", "S2112650301_SE25_CT",
"S2113810301_BY12", "S2113810301_BY12_CT", "UNKNOWN", "XTYSKPLSKOLA_BY23",
"XTYSKPLSKOLA_BY23_CT"), class = "factor"), Freq = c(341L, 14L,
273L, 14L, 66L, 42L, 7L, 48L, 14L, 183L, 21L, 7L, 238L, 7L, 1202L,
188L, 10L, 35L, 7L), per = c(12.5506072874494, 0.515274199484726,
10.0478468899522, 0.515274199484726, 2.42914979757085, 1.54582259845418,
0.257637099742363, 1.76665439823335, 0.515274199484726, 6.73536989326463,
0.772911299227089, 0.257637099742363, 8.75966139124034, 0.257637099742363,
44.23997055576, 6.9193963930806, 0.368052999631947, 1.28818549871181,
0.257637099742363)), .Names = c("Var1", "Freq", "per"), row.names = c(NA,
-19L), class = "data.frame")
我想将字符串Var1
的特定部分保留在新变量land
中。我认为我可以使用gsub
,但我不知道它是否可以删除多个值。除了Var1
之外,我想从le <- c("SE", "BY")
删除所有内容。我用了
df %>% mutate(land = gsub("[1-9]","",Var1)))
但正如我所写,我不知道如何强制gsub
删除其他字符和数字。
答案 0 :(得分:2)
这个正则表达式应该可行。请注意,sub
如果没有匹配则返回完整字符串。
sub("^.*_(SE|BY).*$", "\\1", df$Var1)
[1] "BY" "BY" "BY" "BY" "SE" "SE" "SE" "SE" "SE" "SE" "SE"
[12] "SE" "SE" "SE" "BY" "BY" "UNKNOWN" "BY" "BY"
此处\\1
用于反向引用捕获的()
所需值。使用了锚^
和$
,有时风险.*
与任何字符集中的0个匹配更多。
答案 1 :(得分:2)
我们可以使用str_extract
library(stringr)
df %>%
mutate(land = str_extract(Var1, paste(le, collapse="|")))
# Var1 Freq per land
#1 S2107810801_BY20 341 12.5506073 BY
#2 S2107810801_BY20_CT 14 0.5152742 BY
#3 S2111660501_BY3 273 10.0478469 BY
#4 S2111660501_BY3_CT 14 0.5152742 BY
#5 S2111660501_SE26 66 2.4291498 SE
#6 S2111660501_SE27 42 1.5458226 SE
#7 S2111660501_SE27_CT 7 0.2576371 SE
#8 S2111660501_SE8 48 1.7666544 SE
#9 S2111803201_SE12 14 0.5152742 SE
#10 S2111831801_SE24 183 6.7353699 SE
#11 S2112650301_SE21 21 0.7729113 SE
#12 S2112650301_SE21_CT 7 0.2576371 SE
#13 S2112650301_SE25 238 8.7596614 SE
#14 S2112650301_SE25_CT 7 0.2576371 SE
#15 S2113810301_BY12 1202 44.2399706 BY
#16 S2113810301_BY12_CT 188 6.9193964 BY
#17 UNKNOWN 10 0.3680530 <NA>
#18 XTYSKPLSKOLA_BY23 35 1.2881855 BY
#19 XTYSKPLSKOLA_BY23_CT 7 0.2576371 BY