这是我的df:
structure(list(Time = structure(c(3L, 4L, 5L, 6L, 1L, 2L), .Label = c("1/20/15 10:26 AM",
"1/20/15 11:26 AM", "1/20/15 6:26 AM", "1/20/15 7:26 AM", "1/20/15 8:26 AM",
"1/20/15 9:26 AM"), class = "factor"), Server1 = structure(c(1L,
4L, 5L, 2L, 3L, 6L), .Label = c("1.08", "12.08", "15", "4", "7.92",
"No data"), class = "factor"), Server2 = structure(c(1L, 2L,
4L, 4L, 3L, 4L), .Label = c("1.67", "4.33", "7.75", "No data"
), class = "factor"), Server3 = structure(c(1L, 2L, 3L, 5L, 4L,
6L), .Label = c("0.83", "2.33", "3.58", "3.92", "4", "No data"
), class = "factor")), .Names = c("Time", "Server1", "Server2",
"Server3"), row.names = c(NA, -6L), class = "data.frame")
我需要能够将所有单元格转换为数字。当我做的时候
data$Server1<-as.numeric(data$Server1)
我收到此错误:
Error in `$<-.data.frame`(`*tmp*`, "Server", value = numeric(0)) :
replacement has 0 rows, data has 6
此外,我希望能够通过不单独引用数据$ Server1或数据$ Server2将列转换为数字,我可能有几百列。
有没有更好的方法将所有列转换为数字并将非数字单元格替换为NA?
答案 0 :(得分:5)
您可以使用lapply()
在感兴趣的列中应用函数。我认为您希望保留Time
列完好无损,因此我们可以使用[-1]
索引保留该列。
## change all 'No data' elements to NA
is.na(df) <- df == "No data"
## for columns 2:4, drop extra factor levels and convert to numeric
df[-1] <- lapply(droplevels(df)[-1], function(x) as.numeric(levels(x))[x])
给出了
df
Time Server1 Server2 Server3
1 1/20/15 6:26 AM 1.08 1.67 0.83
2 1/20/15 7:26 AM 4.00 4.33 2.33
3 1/20/15 8:26 AM 7.92 NA 3.58
4 1/20/15 9:26 AM 12.08 NA 4.00
5 1/20/15 10:26 AM 15.00 7.75 3.92
6 1/20/15 11:26 AM NA NA NA
但是当您通过在读取调用中使用na.strings
参数将数据读入R时,您可以解决此问题,这样就无需在读取后修复列。
read.table(file, na.strings = "No data")
答案 1 :(得分:3)
使用dplyr
:
library(dplyr)
df %>% mutate_each(funs(as.numeric(levels(.))[.]), -Time)
你得到:
# Time Server1 Server2 Server3
#1 1/20/15 6:26 AM 1.08 1.67 0.83
#2 1/20/15 7:26 AM 4.00 4.33 2.33
#3 1/20/15 8:26 AM 7.92 NA 3.58
#4 1/20/15 9:26 AM 12.08 NA 4.00
#5 1/20/15 10:26 AM 15.00 7.75 3.92
#6 1/20/15 11:26 AM NA NA NA
答案 2 :(得分:1)
data <- replace(data, data == "No data", NA)
cbind(data[1], apply(data[-1], 2, function(x) as.double(as.character(x))))
Time Server1 Server2 Server3
1 1/20/15 6:26 AM 1.08 1.67 0.83
2 1/20/15 7:26 AM 4.00 4.33 2.33
3 1/20/15 8:26 AM 7.92 NA 3.58
4 1/20/15 9:26 AM 12.08 NA 4.00
5 1/20/15 10:26 AM 15.00 7.75 3.92
6 1/20/15 11:26 AM NA NA NA
答案 3 :(得分:1)
我的选择是
df[, 2:ncol(df)] <- apply(df[, 2:ncol(df)], 2, as.numeric)
因为这似乎是最重要的。无需更改“无数据”。到了&#39; NA&#39;因为这是自动完成的,您将收到一条警告消息,通知发生了这种情况。