将非ascii字符替换为定义的字符串列表,而不在R中使用循环

时间:2012-05-22 15:00:48

标签: r replace special-characters non-ascii-characters

我希望用ascii等效替换非ascii字符(现在只有西班牙语)。如果我有“á”,我想用“a”替换它,依此类推。

我构建了这个函数(工作正常),但我不想使用循环(包括像sapply这样的内部循环)。

latin2ascii<-function(x) {
if(!is.character(x)) stop ("input must be a character object")
require(stringr)
mapL<-c("á","é","í","ó","ú","Á","É","Í","Ó","Ú","ñ","Ñ","ü","Ü")
mapA<-c("a","e","i","o","u","A","E","I","O","U","n","N","u","U")
for(y in 1:length(mapL)) {
  x<-str_replace_all(x,mapL[y],mapA[y])
  }
x
}

有一种优雅的解决方法吗?任何帮助,建议或修改都表示赞赏

2 个答案:

答案 0 :(得分:7)

同名包中的

gsubfn()非常适合这类事情:

library(gsubfn)

# Create a named list, in which:
#   - the names are the strings to be looked up
#   - the values are the replacement strings
mapL <- c("á","é","í","ó","ú","Á","É","Í","Ó","Ú","ñ","Ñ","ü","Ü")
mapA <- c("a","e","i","o","u","A","E","I","O","U","n","N","u","U")

# ll <- setNames(as.list(mapA), mapL) # An alternative to the 2 lines below
ll <- as.list(mapA)
names(ll) <- mapL


# Try it out
string <- "ÍÓáÚ"
gsubfn("[áéíóúÁÉÍÓÚñÑüÜ]", ll, string)
# [1] "IOaU"

编辑:

-G。格洛腾迪克指出,基地R也有这个功能:

A <- paste(mapA, collapse="")
L <- paste(mapL, collapse="")
chartr(L, A, "ÍÓáÚ")
# [1] "IOaU"

答案 1 :(得分:2)

我喜欢Josh的版本,但我想我可能会添加另一个'矢量化'解决方案。它返回一个非重音字符串的向量。它也只依赖于base函数。

x=c('íÁuÚ','uíÚÁ')

mapL<-c("á","é","í","ó","ú","Á","É","Í","Ó","Ú","ñ","Ñ","ü","Ü")
mapA<-c("a","e","i","o","u","A","E","I","O","U","n","N","u","U")
split=strsplit(x,split='')
m=lapply(split,match,mapL)
mapply(function(split,m) paste(ifelse(is.na(m),split,mapA[m]),collapse='') , split, m)
# "iAuU" "uiUA"