我有discretized
使用RWeka
的数据框。 RWeka的离散化创建了带有单引号的分档。虽然它们没有造成任何问题,但在绘制它时看起来很难看到有'All'
类别的变量。
这是离散数据框:
structure(list(outlook = structure(c(1L, 1L, 2L, 3L, 3L, 3L,
2L, 1L, 1L, 3L, 1L, 2L, 2L, 3L), .Label = c("sunny", "overcast",
"rainy"), class = "factor"), temperature = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"),
humidity = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"),
windy = c(FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE), play = structure(c(2L,
2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("yes",
"no"), class = "factor")), .Names = c("outlook", "temperature",
"humidity", "windy", "play"), row.names = c(NA, -14L), class = "data.frame")
如何从数据中删除单引号并重新创建因子?
答案 0 :(得分:3)
这应该这样做:
df$temperature <- gsub("\\'", "", df$temperature)
df$humidity <- gsub("\\'", "", df$humidity)
> df
outlook temperature humidity windy play
1 sunny All All FALSE no
2 sunny All All TRUE no
3 overcast All All FALSE yes
4 rainy All All FALSE yes
5 rainy All All FALSE yes
6 rainy All All TRUE no
7 overcast All All TRUE yes
8 sunny All All FALSE no
9 sunny All All FALSE yes
10 rainy All All FALSE yes
11 sunny All All TRUE yes
12 overcast All All TRUE yes
13 overcast All All FALSE yes
14 rainy All All TRUE no
如果您需要在多个列上执行相同的操作,这可能会更有效。
df[, 2:3] <- apply(df[, 2:3], 2, function(x) {
gsub("\\'", "", x)
})