用相应的字母代替字母

时间:2012-06-23 18:27:54

标签: regex r

我遇到了一个小问题,但我找不到合适的搜索条件。 我有来自“A” - “N”的字母,并希望根据它们在字母表中的位置用“A” - “G”替换大于“G”的字母。使用gsub似乎很麻烦。或者有没有可以更聪明地做到这一点的正则表达式?

k <- rep(LETTERS[1:14],2)
gsub(pattern="H", replace="A", x=k)
gsub(pattern="I", replace="B", x=k)
gsub(pattern="J", replace="C", x=k)
gsub(pattern="K", replace="D", x=k)
# etc.

是不是有某些方法可以将字符转换为整数然后只是在整数值内计算然后再回来?或者是否有任何相反的信件? as.numeric()as.integer()返回NA

5 个答案:

答案 0 :(得分:11)

这将H-N转换为A-G:

chartr("HIJKLMN", "ABCDEFG", k)

答案 1 :(得分:4)

每当我发现这样的问题时,我的第一个想法就是match

AG <- LETTERS[1:7]
HN <- LETTERS[8:14]

k <- rep(LETTERS[1:14],2)
n <- AG[match(k, HN)]
ifelse(is.na(n), k, n)
# [1] "A" "B" "C" "D" "E" "F" "G" "A" "B" "C" "D" "E" "F" "G" "A" "B" "C" "D" "E"
#[20] "F" "G" "A" "B" "C" "D" "E" "F" "G"

我以相同的方式构造反LETTERS函数:

invLETTERS <- function(x) match(x, LETTERS[1:26])
invLETTERS(k)
# [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14  1  2  3  4  5  6  7  8  9 10 11
#[26] 12 13 14

答案 2 :(得分:4)

这是一个简洁明了的解决方案:

k <- rep(LETTERS[1:14],2)

# (1) Create a lookup vector whose elements can be indexed into  
#     by their names and will return their associated values
subs <- setNames(rep(LETTERS[1:7], 2), LETTERS[1:14])
subs
#   A   B   C   D   E   F   G   H   I   J   K   L   M   N 
# "A" "B" "C" "D" "E" "F" "G" "A" "B" "C" "D" "E" "F" "G" 

# (2) Use it.
unname(subs[k])
#  [1] "A" "B" "C" "D" "E" "F" "G" "A" "B" "C" "D" "E" "F" "G"
# [15] "A" "B" "C" "D" "E" "F" "G" "A" "B" "C" "D" "E" "F" "G"

答案 3 :(得分:3)

我确信有一种方法可以使这个更紧凑,但这可能是你在第二个非正则表达式想法中想到的事情:

k <- factor(k)
> k1 <- as.integer(k) %% 7
> k1[k1 == 0] <- 7
> LETTERS[k1]
 [1] "A" "B" "C" "D" "E" "F" "G" "A" "B" "C" "D" "E" "F" "G" "A" "B" "C" "D" "E" "F" "G" "A"
[23] "B" "C" "D" "E" "F" "G"

可能有一种聪明的方法来回避0索引问题,但我现在感觉不是很聪明。

修改

评论提出了很好的建议。首先,要处理0形式的模运算:

k1 <- ((as.integer(k)-1) %%7) + 1

并与match合并后,它变成了单行:

k1 <- LETTERS[((match(k, LETTERS)-1) %% 7) + 1]

答案 4 :(得分:2)

如果您的问题仅限于A-N:

set.seed(1)
k = sample(LETTERS[1:14], 42, replace=TRUE)
temp = match(k, LETTERS)
# > table(k)
# k
# A B C D E F G I J K L M N 
# 2 2 5 2 1 6 3 3 5 4 3 3 3 
k[which(temp > 7)] = LETTERS[temp[temp > 7] -7]
# > table(k)
# k
# A  B  C  D  E  F  G 
# 2  5 10  6  4  9  6