R:组合串替换

时间:2014-06-23 12:01:23

标签: string r gsub

我正在寻找一个基于gsub的函数,它可以让我进行组合字符串替换,这样如果我有任意数量的字符串替换规则

replrules=list("<x>"=c(3,5),"<ALK>"=c("hept","oct","non"),"<END>"=c("ane","ene"))

和目标字符串

string="<x>-methyl<ALK><END>"

它会给我一个数据框,其中包含最终的字符串名称和

中的替换
name                x        ALK     END
3-methylheptane     3        hept    ane
5-methylheptane     5        hept    ane
3-methyloctane      3        oct     ane
5-methyloctane      5        ...     ...
3-methylnonane      3
5-methylnonane      5
3-methylheptene     3
5-methylheptene     5
3-methyloctene      3
5-methyloctene      5
3-methylnonene      3
5-methylnonene      5

目标字符串具有任意结构,例如它也可以是string="1-<ALK>anol",或者每个模式可以多次出现,如string="<ALK>anedioic acid, di<ALK>yl ester"

在R中做这种事情最优雅的方式是什么?

2 个答案:

答案 0 :(得分:2)

怎么样

d <- do.call(expand.grid, replrules)

d$name <- paste0(d$'<x>', "-", "methyl", d$'<ALK>', d$'<END>')


修改

这似乎有效(将其中的每一个代入strplit

string = "<x>-methyl<ALK><END>"
string2 = "<x>-ethyl<ALK>acosane"
string3 = "1-<ALK>anol"

使用Richards正则表达式

d <- do.call(expand.grid, list(replrules, stringsAsFactors=FALSE))
names(d) <- gsub("<|>","",names(d))

s <- strsplit(string3, "(<|>)", perl = TRUE)[[1]]

out <- list()

for(i in s) {
  out[[i]] <- ifelse (i %in% names(d), d[i], i)
}

d$name <- do.call(paste0,  unlist(out, recursive=F))


修改

这适用于重复项目

d <- do.call(expand.grid, list(replrules, stringsAsFactors=FALSE))
names(d) <- gsub("<|>","",names(d))

string4 = "<x>-methyl<ALK><END>oate<ALK>"

s <- strsplit(string4, "(<|>)", perl = TRUE)[[1]]
out <- list()
for(i in seq_along(s)) {
  out[[i]] <- ifelse (s[i] %in% names(d), d[s[i]], s[i])
}
d$name <- do.call(paste0,  unlist(out, recursive=F))

答案 1 :(得分:1)

嗯,我不确定我们甚至可以为您的问题提供“正确”答案,但希望这有助于您提供一些想法。

好的,所以在s中,我只是将字符串拆分为最重要的位置。然后g获取r的每个元素中的第一个值。然后我构建了一个数据框作为例子。那么dat就是它的外观的一行示例。

> (s <- strsplit(string, "(?<=l|\\>)", perl = TRUE)[[1]])
# [1] "<x>"     "-methyl" "<ALK>"   "<END>"  
> g <- sapply(replrules, "[", 1)
> dat <- data.frame(name = paste(append(g, s[2], after = 1), collapse = ""))
> dat[2:4] <- g
> names(dat)[2:4] <- sapply(strsplit(names(g), "<|>"), "[", -1)
> dat
#              name x  ALK END
# 1 3-methylheptane 3 hept ane