我试图用sapply函数替换数据框中的unicode“ U + 00F3”,但没有任何反应。我要替换的unicode部分是chr类型。
此功能:
tableExcel$Team <- sapply(tableExcel$Team, gsub, pattern = "<U+00F3>", replacement= "o")
编辑:
感谢下面的Cath回答,我在+之前添加了:\\
tableExcel$Team <- sapply(tableExcel$Team, gsub, pattern = "<U\\+00F3>", replacement= "o")
但这没用。
我也尝试提供我的数据集的一个例子,但问题是它可以在它上面而不在我的上面工作:
tableExcel <- data.frame("Team" = c("A", "B", "C", "Reducci<U+00F3>n"), "Point" = c(2, 30, 40, 30))
tableExcel$Team <- as.character(tableExcel$Team)
要提供更多信息,请在此处导入我的excel文件:
tableExcel <- as.data.frame(read_excel("Dataset LOS.xls", sheet = "Liga Squads"))
我的数据结构:
structure(list(Team = c("CHURN", "CHURN", "RESIDENCIAL NPTB", "RESIDENCIAL NPTB", "AUDIENCIAS TV", "AUDIENCIAS TV"), Points = c("P. Asig", "P. entr", "P. Asig", "P. entr", "P. Asig", "P. entr"), 2019-S01 = c(0, 0, 50, 0, NA, NA), 2019-S02 = c(0, 0, 10, 10, NA, NA), 2019-S03 = c(93, 88, 46, 19, NA, NA), 2019-S04 = c(56, 48, 0, 0, 13, 13), 2019-S05 = c(NA, NA, 80.5, 49.5, 42, 28.5), 2019-S06 = c(NA, NA, 66, 48, 55, 39.5), 2019-S07 = c(131, 112, 103, 63, 40.5, 38)), row.names = c(1L, 2L, 4L, 5L, 7L, 8L), class = "data.frame")
答案 0 :(得分:2)
我无法使用gsub
复制问题。预期效果如下:
tableExcel$Team <- gsub("<U\\+00F3>", "o", tableExcel$Team)
#### OUTPUT ####
Team Points 2019-S01 2019-S02 2019-S03 2019-S04 2019-S05 2019-S06 2019-S07
1 Reducci<U+00F1>n P. Asig 0 0 93 56 NA NA 131.0
2 CHURN P. entr 0 0 88 48 NA NA 112.0
4 Reducci<U+00F2>n P. Asig 50 10 46 0 80.5 66.0 103.0
5 RESIDENCIAL NPTB P. entr 0 10 19 0 49.5 48.0 63.0
7 AUDIENCIAS TV P. Asig NA NA NA 13 42.0 55.0 40.5
8 <NA> P. entr NA NA NA 13 28.5 39.5 38.0
9 Reduccion P. entr NA NA NA NA NA NA NA
但是,使用正则表达式替换可能不是转换Unicode字符的最有效方法,因为这将需要多次调用gsub
。相反,您可能想尝试一下stringi的stri_unescape_unicode()
:
# install.packages("stringi") # Use if not yet installed.
library(stringi)
tableExcel$Team <- stri_unescape_unicode(gsub("<U\\+(.*)>", "\\\\u\\1", tableExcel$Team))
#### OUTPUT ####
Team Points 2019-S01 2019-S02 2019-S03 2019-S04 2019-S05 2019-S06 2019-S07
1 Reducciñn P. Asig 0 0 93 56 NA NA 131.0
2 CHURN P. entr 0 0 88 48 NA NA 112.0
4 Reducciòn P. Asig 50 10 46 0 80.5 66.0 103.0
5 RESIDENCIAL NPTB P. entr 0 10 19 0 49.5 48.0 63.0
7 AUDIENCIAS TV P. Asig NA NA NA 13 42.0 55.0 40.5
8 <NA> P. entr NA NA NA 13 28.5 39.5 38.0
9 Reducción P. entr NA NA NA NA NA NA NA
首先使用<U+0000>
将格式\\u0000
转换为gsub
,然后对其进行转义。如您所见,它可以一次性处理多个Unicode字符,这使事情变得更加简单。
tableExcel <- structure(list(Team = c("Reducci<U+00F1>n", "CHURN", "Reducci<U+00F2>n",
"RESIDENCIAL NPTB", "AUDIENCIAS TV", NA, "Reducci<U+00F3>n"),
Points = c("P. Asig", "P. entr", "P. Asig", "P. entr", "P. Asig",
"P. entr", "P. entr"), `2019-S01` = c(0, 0, 50, 0, NA, NA,
NA), `2019-S02` = c(0, 0, 10, 10, NA, NA, NA), `2019-S03` = c(93,
88, 46, 19, NA, NA, NA), `2019-S04` = c(56, 48, 0, 0, 13,
13, NA), `2019-S05` = c(NA, NA, 80.5, 49.5, 42, 28.5, NA),
`2019-S06` = c(NA, NA, 66, 48, 55, 39.5, NA), `2019-S07` = c(131,
112, 103, 63, 40.5, 38, NA)), row.names = c(1L, 2L, 4L, 5L,
7L, 8L, 9L), class = "data.frame")