Question

我有discretized使用RWeka的数据框。 RWeka的离散化创建了带有单引号的分档。虽然它们没有造成任何问题，但在绘制它时看起来很难看到有'All'类别的变量。

这是离散数据框：

structure(list(outlook = structure(c(1L, 1L, 2L, 3L, 3L, 3L, 
2L, 1L, 1L, 3L, 1L, 2L, 2L, 3L), .Label = c("sunny", "overcast", 
"rainy"), class = "factor"), temperature = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"), 
humidity = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"), 
windy = c(FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, 
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE), play = structure(c(2L, 
2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("yes", 
"no"), class = "factor")), .Names = c("outlook", "temperature", 
"humidity", "windy", "play"), row.names = c(NA, -14L), class = "data.frame")

如何从数据中删除单引号并重新创建因子？

Answer 1

这应该这样做：

df$temperature <- gsub("\\'", "", df$temperature)
df$humidity <- gsub("\\'", "", df$humidity)
> df
    outlook temperature humidity windy play
1     sunny         All      All FALSE   no
2     sunny         All      All  TRUE   no
3  overcast         All      All FALSE  yes
4     rainy         All      All FALSE  yes
5     rainy         All      All FALSE  yes
6     rainy         All      All  TRUE   no
7  overcast         All      All  TRUE  yes
8     sunny         All      All FALSE   no
9     sunny         All      All FALSE  yes
10    rainy         All      All FALSE  yes
11    sunny         All      All  TRUE  yes
12 overcast         All      All  TRUE  yes
13 overcast         All      All FALSE  yes
14    rainy         All      All  TRUE   no

如果您需要在多个列上执行相同的操作，这可能会更有效。

df[, 2:3] <- apply(df[, 2:3], 2, function(x) { 
    gsub("\\'", "", x)
    })

从数据框的因子中删除引号

1 个答案: