用数据框中的值替换unicode

时间:2019-06-27 06:31:50

标签: r unicode shiny

我试图用sapply函数替换数据框中的unicode“ U + 00F3”,但没有任何反应。我要替换的unicode部分是chr类型。

此功能:

tableExcel$Team <- sapply(tableExcel$Team, gsub, pattern = "<U+00F3>", replacement= "o")

编辑:

感谢下面的Cath回答,我在+之前添加了:\\

tableExcel$Team <- sapply(tableExcel$Team, gsub, pattern = "<U\\+00F3>", replacement= "o")

但这没用。

我也尝试提供我的数据集的一个例子,但问题是它可以在它上面而不在我的上面工作:

tableExcel <- data.frame("Team" = c("A", "B", "C", "Reducci<U+00F3>n"), "Point" = c(2, 30, 40, 30))
tableExcel$Team <- as.character(tableExcel$Team)   

要提供更多信息,请在此处导入我的excel文件:

tableExcel <- as.data.frame(read_excel("Dataset LOS.xls", sheet = "Liga Squads"))

我的数据结构:

structure(list(Team = c("CHURN", "CHURN", "RESIDENCIAL NPTB", "RESIDENCIAL NPTB", "AUDIENCIAS TV", "AUDIENCIAS TV"), Points = c("P. Asig", "P. entr", "P. Asig", "P. entr", "P. Asig", "P. entr"), 2019-S01 = c(0, 0, 50, 0, NA, NA), 2019-S02 = c(0, 0, 10, 10, NA, NA), 2019-S03 = c(93, 88, 46, 19, NA, NA), 2019-S04 = c(56, 48, 0, 0, 13, 13), 2019-S05 = c(NA, NA, 80.5, 49.5, 42, 28.5), 2019-S06 = c(NA, NA, 66, 48, 55, 39.5), 2019-S07 = c(131, 112, 103, 63, 40.5, 38)), row.names = c(1L, 2L, 4L, 5L, 7L, 8L), class = "data.frame")

1 个答案:

答案 0 :(得分:2)

我无法使用gsub复制问题。预期效果如下:

tableExcel$Team <- gsub("<U\\+00F3>", "o", tableExcel$Team)

#### OUTPUT ####

              Team  Points 2019-S01 2019-S02 2019-S03 2019-S04 2019-S05 2019-S06 2019-S07
1 Reducci<U+00F1>n P. Asig        0        0       93       56       NA       NA    131.0
2            CHURN P. entr        0        0       88       48       NA       NA    112.0
4 Reducci<U+00F2>n P. Asig       50       10       46        0     80.5     66.0    103.0
5 RESIDENCIAL NPTB P. entr        0       10       19        0     49.5     48.0     63.0
7    AUDIENCIAS TV P. Asig       NA       NA       NA       13     42.0     55.0     40.5
8             <NA> P. entr       NA       NA       NA       13     28.5     39.5     38.0
9        Reduccion P. entr       NA       NA       NA       NA       NA       NA       NA

但是,使用正则表达式替换可能不是转换Unicode字符的最有效方法,因为这将需要多次调用gsub。相反,您可能想尝试一下stringi的stri_unescape_unicode()

# install.packages("stringi") # Use if not yet installed.
library(stringi)

tableExcel$Team <- stri_unescape_unicode(gsub("<U\\+(.*)>", "\\\\u\\1", tableExcel$Team))

#### OUTPUT ####

              Team  Points 2019-S01 2019-S02 2019-S03 2019-S04 2019-S05 2019-S06 2019-S07
1        Reducciñn P. Asig        0        0       93       56       NA       NA    131.0
2            CHURN P. entr        0        0       88       48       NA       NA    112.0
4        Reducciòn P. Asig       50       10       46        0     80.5     66.0    103.0
5 RESIDENCIAL NPTB P. entr        0       10       19        0     49.5     48.0     63.0
7    AUDIENCIAS TV P. Asig       NA       NA       NA       13     42.0     55.0     40.5
8             <NA> P. entr       NA       NA       NA       13     28.5     39.5     38.0
9        Reducción P. entr       NA       NA       NA       NA       NA       NA       NA

首先使用<U+0000>将格式\\u0000转换为gsub,然后对其进行转义。如您所见,它可以一次性处理多个Unicode字符,这使事情变得更加简单。

数据:

tableExcel <- structure(list(Team = c("Reducci<U+00F1>n", "CHURN", "Reducci<U+00F2>n", 
"RESIDENCIAL NPTB", "AUDIENCIAS TV", NA, "Reducci<U+00F3>n"), 
    Points = c("P. Asig", "P. entr", "P. Asig", "P. entr", "P. Asig", 
    "P. entr", "P. entr"), `2019-S01` = c(0, 0, 50, 0, NA, NA, 
    NA), `2019-S02` = c(0, 0, 10, 10, NA, NA, NA), `2019-S03` = c(93, 
    88, 46, 19, NA, NA, NA), `2019-S04` = c(56, 48, 0, 0, 13, 
    13, NA), `2019-S05` = c(NA, NA, 80.5, 49.5, 42, 28.5, NA), 
    `2019-S06` = c(NA, NA, 66, 48, 55, 39.5, NA), `2019-S07` = c(131, 
    112, 103, 63, 40.5, 38, NA)), row.names = c(1L, 2L, 4L, 5L, 
7L, 8L, 9L), class = "data.frame")