如何在R中使用正则表达式映射值

时间:2016-12-16 16:24:52

标签: r regex

我有一个名为df的数据框:

dput(df)
structure(list(Agent = structure(c(3L, 1L, 2L), .Label = c("dm_domain@domain01", 
"ns_name@namesrv200", "prodb101@webserver101"), class = "factor"), 
    Server = structure(c(3L, 1L, 2L), .Label = c("domain01", 
    "namesrv200", "proddb101"), class = "factor")), .Names = c("Agent", 
"Server"), class = "data.frame", row.names = c(NA, -3L))

有一个名为app的向量包含值:

 dput(app)
c("db", "dm", "ns")

我需要向df添加另一个名为app的列,并将app列的值与app列相匹配。这是最终结果的df1:

dput(df1)
structure(list(Agent = structure(c(3L, 1L, 2L), .Label = c("dm_domain@domain01", 
"ns_name@namesrv200", "prodb101@webserver101"), class = "factor"), 
    Server = structure(c(3L, 1L, 2L), .Label = c("domain01", 
    "namesrv200", "proddb101"), class = "factor"), App = structure(1:3, .Label = c("db", 
    "dm", "ns"), class = "factor")), .Names = c("Agent", "Server", 
"App"), row.names = c(NA, -3L), class = "data.frame")

我如何在R中执行此操作,在df中创建一个列并将值插入到与app值匹配的列中?

1 个答案:

答案 0 :(得分:2)

你可以做到

app <- c("db", "dm", "ns")
names(app) <- c("proddb101", "domain01", "namesrv200")
df$App <- app[as.character(df$Server)]
df
#                   Agent     Server App
# 1 prodb101@webserver101  proddb101  db
# 2    dm_domain@domain01   domain01  dm
# 3    ns_name@namesrv200 namesrv200  ns

其中proddb101映射到db,依此类推。 as.character是必要的,因为df$Server的类型为factor

或者,如果你想更频繁地匹配,你可以做

app <- c("db", "dm", "ns")
vgrepl <- Vectorize(grepl, "pattern")
m <- vgrepl(app, df$Agent, fixed = TRUE)
df$App <- colnames(m)[max.col(m, "first")] # assign first match
df
#                   Agent     Server App
# 1 prodb101@webserver101  proddb101  db
# 2    dm_domain@domain01   domain01  dm
# 3    ns_name@namesrv200 namesrv200  ns