Bin值成因素

时间:2016-07-15 14:12:15

标签: r

在我的数据框中,我有一个包含一系列位置值的列:

DistData$LOC=c(0,0,0,25,25,35,35,45,45,55,55,55,65,65,65,75,75,75,100,100,100,REF1,REF1,REF2,REf3,REF3,REF3)

我想把这些分成几个因素,其中小于50的值是" NEAR" 50或更大的是" FAR"所有REF *都是" REF"。

我尝试过转换为as.numeric()然后切换():

Zone = as.numeric(DistData$LOC)
Zone = cut(Zone, c(0,50,200,NA),labels=c("NEAR","FAR","REF"))

尝试了一系列ifelse陈述:

Zone = ifelse(Zone<50,"NEAR", ifelse(Zone>=50,"FAR", ifelse(is.na(Zone)<-"REF")))

还玩弄gsub(),lapply()和replace()但没有成功。

非常感谢任何帮助。

3 个答案:

答案 0 :(得分:4)

这是一种产生所需结果的更长方法:

# convert to character vector
temp <- as.character(DistData$LOC)
# fill any refs with "999"
temp[grep("REF", toupper(temp)) ] <- "999"
# use cut to get desired categories
cut(as.numeric(temp), breaks=c(0,50,100, 1000), labels=c("near", "far", "ref"),
    include.lowest=T)

这导致

[1] near near near near near near near near near far  far  far  far  far  far  far  far  far 
[19] far  far  far  ref  ref  ref  ref  ref  ref 
Levels: near far ref

答案 1 :(得分:0)

我喜欢@ lmo的解决方案,但是另一个使用mltools中的bin_data()函数。

library(mltools)

# Build vector of values.  I assume class(DistData$LOC) == "character" in your data.frame
temp <- c(
  "0", "0", "0", "25", "25", "35", "35", "45", "45", "55", "55", 
  "55", "65", "65", "65", "75", "75", "75", "100", "100", "100", 
  "REF1", "REF1", "REF2", "REf3", "REF3", "REF3"
)

# Convert to numeric. REFs (and other non-numeric values will coerce to NA)
temp_numeric <- as.numeric(temp)

# Bin the values using "[left-closed, right-open)" bins
binned_vals <- bin_data(temp_numeric, bins=c(-Inf, 50, Inf), boundaryType = "lcro)")

# Insert a level for REF > all other levels
levels(binned_vals) <- c(levels(binned_vals), "REF")

# Locate REFs and set convert from NA to REF
binned_vals[grep("REF", toupper(temp))] <- "REF"

binned_vals
[1] [-Inf, 50) [-Inf, 50) [-Inf, 50) [-Inf, 50) [-Inf, 50) [-Inf, 50) [-Inf, 50) [-Inf, 50) [-Inf, 50) [50, Inf)  [50, Inf)  [50, Inf) 
[13] [50, Inf)  [50, Inf)  [50, Inf)  [50, Inf)  [50, Inf)  [50, Inf)  [50, Inf)  [50, Inf)  [50, Inf)  REF        REF        REF       
[25] REF        REF        REF       
Levels: [-Inf, 50) < [50, Inf) < REF

答案 2 :(得分:0)

cut创建一个因子变量,但其级别不包含NA

我们可以使用addNA来修复此问题,然后使用factor分配正确的标签,请注意不要排除我们刚刚创建的NA级别:

factor(
  addNA(cut(as.numeric(LOC),c(0,50,200),right = FALSE)),
  labels =  c("NEAR","FAR","REF"),exclude=NULL)
#  [1] NEAR NEAR NEAR NEAR NEAR NEAR NEAR NEAR NEAR FAR  FAR  FAR  FAR  FAR  FAR  FAR  FAR  FAR  FAR  FAR  FAR  REF  REF  REF  REF  REF  REF 
# Levels: NEAR FAR REF