使用gsub在函数内部无效的字符替换

时间:2015-09-26 21:19:37

标签: r gsub

我试图替换R中数据框中的一些意外字符。根据Replace multiple arguments with gsub,gsub函数应该在这种情况下正常工作,所以我尝试了这种方式。

我在数据框第一列中的值如下:

La Flèche Wallonne
Liège - Bastogne - Liège
Tour de Romandie
Giro d´Italia
Critérium du Dauphiné

代码的实施如下:

callChangeCharacters <- function(results){
for(i in 1:nrow(results)){
    race <- results[i,1]
    race <- gsub("é","e",race)
    race <- gsub("â","a",race)
    race <- gsub("ó","o",race)
    race <- gsub("ž","z",race)
    race <- gsub("ú","u",race)
    race <- gsub("ø","o",race)
    race <- gsub("Å›","s",race)
    race <- gsub("Å‚","l",race)
    race <- gsub("ä‚","a",race)
    race <- gsub("è","e",race)
    race <- gsub("Ã","a",race)
    race <- gsub("Å","s",race)
    race <- gsub("Ä","c",race)
    race <- gsub("´","'",race)
    results[i,1] <- race
}
return(results)
}

如果我运行for循环中的代码,我成功获得了预期的结果:

La Fleche Wallonne
Liege - Bastogne - Liege
Tour de Romandie
Giro d'Italia
Criterium du Dauphine

但是,如果我调用该函数,结果不相同,并且不会更正不需要的字符:

> correctedDF <- callChangeCharacters(results)
> correctedDF
                                        V1
La Flèche Wallonne
Liège - Bastogne - Liège
Tour de Romandie
Giro d´Italia
Critérium du Dauphiné

我得到的结果的输出如下(此版本的结果更长但问题是相同的):

> dput(results)
structure(list(V1 = c("Santos Tour Down Under", "Paris - Nice", 
"Tirreno-Adriatico", "Milano-Sanremo", "Volta Ciclista a Catalunya", 
"E3 Prijs Vlaanderen - Harelbeke", "Gent - Wevelgem", "Ronde van Vlaanderen / Tour des Flandres", 
"Vuelta Ciclista al Pais Vasco", "Paris - Roubaix", "Amstel Gold Race", 
"La Flèche Wallonne", "Liège - Bastogne - Liège", "Tour de Romandie", 
"Giro d´Italia", "Critérium du Dauphiné", "Tour de Suisse", 
"Tour de France", "Tour de Pologne", NA, "Clasica Ciclista San Sebastian", 
"Eneco Tour", "Vuelta a España", "Vattenfall Cyclassics", "GP Ouest France - Plouay", 
"Grand Prix Cycliste de Québec", "Grand Prix Cycliste de Montréal", 
"Il Lombardia", "Tour of Beijing")), .Names = "V1", row.names = c(1L, 
1686L, 4601L, 6743L, 6943L, 9274L, 9473L, 9673L, 9880L, 11581L, 
11779L, 11978L, 12168L, 12367L, 14264L, 21957L, 24734L, 27727L, 
35542L, 37354L, 37470L, 37627L, 39885L, 47277L, 47441L, 47624L, 
47788L, 47952L, 48147L), class = "data.frame")

知道为什么它不能在函数内部工作吗?

提前致谢。

2 个答案:

答案 0 :(得分:2)

我遇到了类似的问题,因为我使用source函数导入我的代码而未指定encoding参数应为"utf-8"

source("./code.R")

在检查我读过的函数时,我意识到source函数已经改变了某些特殊字符,因此函数没有按预期工作。解决方案是将encoding参数设置为"utf-8"

source("./code.R", encoding="utf-8")

答案 1 :(得分:0)

您的代码有效。此外,您还应该更改ñ(请参阅“VueltaaEspaña”)。

gsub函数是矢量化的,因此您根本不需要循环。

cleanup <- function(race) {
    race <- gsub("é","e",race)
    race <- gsub("â","a",race)
    race <- gsub("ó","o",race)
    race <- gsub("ž","z",race)
    race <- gsub("ú","u",race)
    race <- gsub("ø","o",race)
    race <- gsub("Å›","s",race)
    race <- gsub("Å‚","l",race)
    race <- gsub("ä‚","a",race)
    race <- gsub("è","e",race)
    race <- gsub("Ã","a",race)
    race <- gsub("Å","s",race)
    race <- gsub("Ä","c",race)
    race <- gsub("´","'",race)
    return(race)
}

results$V1 <- cleanup(results$V1)

如果您只有一列,为什么要使用data.frame?保持向量race

会更方便

如果你真的想要一个直接在results上运行的函数,那么仍然没有循环。

callChangeCharacters <- function(results) {
    results[,1] <- cleanup(results[,1])
    return(results)
}