使用R将因子转换为数字时出错

时间:2014-08-26 16:33:35

标签: r

更新

实际上,我的问题是sum(data[,"employee_count"], na.rm = T)

我有原始数据:

employee_count
1-49
0
150-249
1-49
1000+

我编写的代码如下:

data$employee_count<- as.character.factor (data$employee_count)
data[data$employee_count=="1-49","employee_count"]<-1
data[data$employee_count=="50-149","employee_count"]<-2
data[data$employee_count=="150-249","employee_count"]<-3
data[data$employee_count=="250-499","employee_count"]<-4
data[data$employee_count=="500-749","employee_count"]<-5
data[data$employee_count=="750-999","employee_count"]<-6
data[data$employee_count=="1000+","employee_count"]<-7

然后数据更改如下:

employee_count
"1"
"0"
"3"
"1"
"7"

然后我尝试将其更改为数字:

data$employee_count<-as.numeric(as.character(data$employee_count))

代码后数据更改为1 0 3 1 7,但是当我尝试执行sum(data$employee_count)时,输出为NA。我想有些不对劲。

所需的结果是实际将此列更改为数字,这可以参与任何类型的计算。

例如,如果我写了data[1,"employee_count"]+data[2,"employee_count"]

期望的结果将是1 + 0 = 1

如果我写了sum(data$employee_count)

结果应为1 + 0 + 3 + 1 + 7 = 12

如果我写了data[3,"employee_count"]*data[4,"employee_count"]

结果应为3 * 1 = 3

1 个答案:

答案 0 :(得分:2)

sum(as.numeric(factor(data[,1], levels=unique(data[,1]))))
#[1] 6

如果您查看order

 as.numeric(factor(data[,1], levels=unique(data[,1])))
 #[1] 1 2 3

不一样
 as.numeric(factor(data[,1]))
 #[1] 1 3 2

数据

data <- structure(list(employee_count = c("1-49", "50-149", "150-249"
 )), .Names = "employee_count", class = "data.frame", row.names = c(NA, 
-3L))

更新

 data <- structure(list(employee_count = c("1-49", "0", "150-249", "250-499", 
 "1-49", "500-749", "500-749", "750-999", "50-149", "1000+", "150-249"
 )), .Names = "employee_count", row.names = c(NA, -11L), class = "data.frame")


 data1 <- data

 data[,1] <- as.numeric(factor(data[,1], 
          levels=c('0', '1-49', '50-149', '150-249', '250-499', '500-749', '750-999', '1000+')))-1


 data[,1]
 #[1] 1 0 3 4 1 5 5 6 2 7 3

 data1[,1]
 #[1] "1-49"    "0"       "150-249" "250-499" "1-49"    "500-749" "500-749"
 #[8] "750-999" "50-149"  "1000+"   "150-249"

  sum(data[,1])
  #[1] 37
 data[3,"employee_count"]*data[4,"employee_count"]
 #[1] 12  #different value because I used a different data