我试图替换R中数据框中的一些意外字符。根据Replace multiple arguments with gsub,gsub函数应该在这种情况下正常工作,所以我尝试了这种方式。
我在数据框第一列中的值如下:
La Flèche Wallonne
Liège - Bastogne - Liège
Tour de Romandie
Giro d´Italia
Critérium du Dauphiné
代码的实施如下:
callChangeCharacters <- function(results){
for(i in 1:nrow(results)){
race <- results[i,1]
race <- gsub("é","e",race)
race <- gsub("â","a",race)
race <- gsub("ó","o",race)
race <- gsub("ž","z",race)
race <- gsub("ú","u",race)
race <- gsub("ø","o",race)
race <- gsub("Å›","s",race)
race <- gsub("Å‚","l",race)
race <- gsub("ä‚","a",race)
race <- gsub("è","e",race)
race <- gsub("Ã","a",race)
race <- gsub("Å","s",race)
race <- gsub("Ä","c",race)
race <- gsub("´","'",race)
results[i,1] <- race
}
return(results)
}
如果我运行for循环中的代码,我成功获得了预期的结果:
La Fleche Wallonne
Liege - Bastogne - Liege
Tour de Romandie
Giro d'Italia
Criterium du Dauphine
但是,如果我调用该函数,结果不相同,并且不会更正不需要的字符:
> correctedDF <- callChangeCharacters(results)
> correctedDF
V1
La Flèche Wallonne
Liège - Bastogne - Liège
Tour de Romandie
Giro d´Italia
Critérium du Dauphiné
我得到的结果的输出如下(此版本的结果更长但问题是相同的):
> dput(results)
structure(list(V1 = c("Santos Tour Down Under", "Paris - Nice",
"Tirreno-Adriatico", "Milano-Sanremo", "Volta Ciclista a Catalunya",
"E3 Prijs Vlaanderen - Harelbeke", "Gent - Wevelgem", "Ronde van Vlaanderen / Tour des Flandres",
"Vuelta Ciclista al Pais Vasco", "Paris - Roubaix", "Amstel Gold Race",
"La Flèche Wallonne", "Liège - Bastogne - Liège", "Tour de Romandie",
"Giro d´Italia", "Critérium du Dauphiné", "Tour de Suisse",
"Tour de France", "Tour de Pologne", NA, "Clasica Ciclista San Sebastian",
"Eneco Tour", "Vuelta a España", "Vattenfall Cyclassics", "GP Ouest France - Plouay",
"Grand Prix Cycliste de Québec", "Grand Prix Cycliste de Montréal",
"Il Lombardia", "Tour of Beijing")), .Names = "V1", row.names = c(1L,
1686L, 4601L, 6743L, 6943L, 9274L, 9473L, 9673L, 9880L, 11581L,
11779L, 11978L, 12168L, 12367L, 14264L, 21957L, 24734L, 27727L,
35542L, 37354L, 37470L, 37627L, 39885L, 47277L, 47441L, 47624L,
47788L, 47952L, 48147L), class = "data.frame")
知道为什么它不能在函数内部工作吗?
提前致谢。
答案 0 :(得分:2)
我遇到了类似的问题,因为我使用source
函数导入我的代码而未指定encoding
参数应为"utf-8"
。
source("./code.R")
在检查我读过的函数时,我意识到source
函数已经改变了某些特殊字符,因此函数没有按预期工作。解决方案是将encoding
参数设置为"utf-8"
。
source("./code.R", encoding="utf-8")
答案 1 :(得分:0)
您的代码有效。此外,您还应该更改ñ
(请参阅“VueltaaEspaña”)。
gsub
函数是矢量化的,因此您根本不需要循环。
cleanup <- function(race) {
race <- gsub("é","e",race)
race <- gsub("â","a",race)
race <- gsub("ó","o",race)
race <- gsub("ž","z",race)
race <- gsub("ú","u",race)
race <- gsub("ø","o",race)
race <- gsub("Å›","s",race)
race <- gsub("Å‚","l",race)
race <- gsub("ä‚","a",race)
race <- gsub("è","e",race)
race <- gsub("Ã","a",race)
race <- gsub("Å","s",race)
race <- gsub("Ä","c",race)
race <- gsub("´","'",race)
return(race)
}
results$V1 <- cleanup(results$V1)
如果您只有一列,为什么要使用data.frame?保持向量race
。
如果你真的想要一个直接在results
上运行的函数,那么仍然没有循环。
callChangeCharacters <- function(results) {
results[,1] <- cleanup(results[,1])
return(results)
}