更新
sum(data[,"employee_count"], na.rm = T)
我有原始数据:
employee_count
1-49
0
150-249
1-49
1000+
我编写的代码如下:
data$employee_count<- as.character.factor (data$employee_count)
data[data$employee_count=="1-49","employee_count"]<-1
data[data$employee_count=="50-149","employee_count"]<-2
data[data$employee_count=="150-249","employee_count"]<-3
data[data$employee_count=="250-499","employee_count"]<-4
data[data$employee_count=="500-749","employee_count"]<-5
data[data$employee_count=="750-999","employee_count"]<-6
data[data$employee_count=="1000+","employee_count"]<-7
然后数据更改如下:
employee_count
"1"
"0"
"3"
"1"
"7"
然后我尝试将其更改为数字:
data$employee_count<-as.numeric(as.character(data$employee_count))
代码后数据更改为1 0 3 1 7
,但是当我尝试执行sum(data$employee_count)
时,输出为NA
。我想有些不对劲。
所需的结果是实际将此列更改为数字,这可以参与任何类型的计算。
例如,如果我写了data[1,"employee_count"]+data[2,"employee_count"]
,
期望的结果将是1 + 0 = 1
。
如果我写了sum(data$employee_count)
,
结果应为1 + 0 + 3 + 1 + 7 = 12
。
如果我写了data[3,"employee_count"]*data[4,"employee_count"]
结果应为3 * 1 = 3
。
答案 0 :(得分:2)
sum(as.numeric(factor(data[,1], levels=unique(data[,1]))))
#[1] 6
如果您查看order
as.numeric(factor(data[,1], levels=unique(data[,1])))
#[1] 1 2 3
与
不一样 as.numeric(factor(data[,1]))
#[1] 1 3 2
data <- structure(list(employee_count = c("1-49", "50-149", "150-249"
)), .Names = "employee_count", class = "data.frame", row.names = c(NA,
-3L))
data <- structure(list(employee_count = c("1-49", "0", "150-249", "250-499",
"1-49", "500-749", "500-749", "750-999", "50-149", "1000+", "150-249"
)), .Names = "employee_count", row.names = c(NA, -11L), class = "data.frame")
data1 <- data
data[,1] <- as.numeric(factor(data[,1],
levels=c('0', '1-49', '50-149', '150-249', '250-499', '500-749', '750-999', '1000+')))-1
data[,1]
#[1] 1 0 3 4 1 5 5 6 2 7 3
data1[,1]
#[1] "1-49" "0" "150-249" "250-499" "1-49" "500-749" "500-749"
#[8] "750-999" "50-149" "1000+" "150-249"
sum(data[,1])
#[1] 37
data[3,"employee_count"]*data[4,"employee_count"]
#[1] 12 #different value because I used a different data