在我的数据框中,我有一个包含一系列位置值的列:
DistData$LOC=c(0,0,0,25,25,35,35,45,45,55,55,55,65,65,65,75,75,75,100,100,100,REF1,REF1,REF2,REf3,REF3,REF3)
我想把这些分成几个因素,其中小于50的值是" NEAR" 50或更大的是" FAR"所有REF *都是" REF"。
我尝试过转换为as.numeric()然后切换():
Zone = as.numeric(DistData$LOC)
Zone = cut(Zone, c(0,50,200,NA),labels=c("NEAR","FAR","REF"))
尝试了一系列ifelse陈述:
Zone = ifelse(Zone<50,"NEAR", ifelse(Zone>=50,"FAR", ifelse(is.na(Zone)<-"REF")))
还玩弄gsub(),lapply()和replace()但没有成功。
非常感谢任何帮助。
答案 0 :(得分:4)
这是一种产生所需结果的更长方法:
# convert to character vector
temp <- as.character(DistData$LOC)
# fill any refs with "999"
temp[grep("REF", toupper(temp)) ] <- "999"
# use cut to get desired categories
cut(as.numeric(temp), breaks=c(0,50,100, 1000), labels=c("near", "far", "ref"),
include.lowest=T)
这导致
[1] near near near near near near near near near far far far far far far far far far
[19] far far far ref ref ref ref ref ref
Levels: near far ref
答案 1 :(得分:0)
我喜欢@ lmo的解决方案,但是另一个使用mltools中的bin_data()
函数。
library(mltools)
# Build vector of values. I assume class(DistData$LOC) == "character" in your data.frame
temp <- c(
"0", "0", "0", "25", "25", "35", "35", "45", "45", "55", "55",
"55", "65", "65", "65", "75", "75", "75", "100", "100", "100",
"REF1", "REF1", "REF2", "REf3", "REF3", "REF3"
)
# Convert to numeric. REFs (and other non-numeric values will coerce to NA)
temp_numeric <- as.numeric(temp)
# Bin the values using "[left-closed, right-open)" bins
binned_vals <- bin_data(temp_numeric, bins=c(-Inf, 50, Inf), boundaryType = "lcro)")
# Insert a level for REF > all other levels
levels(binned_vals) <- c(levels(binned_vals), "REF")
# Locate REFs and set convert from NA to REF
binned_vals[grep("REF", toupper(temp))] <- "REF"
binned_vals
[1] [-Inf, 50) [-Inf, 50) [-Inf, 50) [-Inf, 50) [-Inf, 50) [-Inf, 50) [-Inf, 50) [-Inf, 50) [-Inf, 50) [50, Inf) [50, Inf) [50, Inf)
[13] [50, Inf) [50, Inf) [50, Inf) [50, Inf) [50, Inf) [50, Inf) [50, Inf) [50, Inf) [50, Inf) REF REF REF
[25] REF REF REF
Levels: [-Inf, 50) < [50, Inf) < REF
答案 2 :(得分:0)
cut
创建一个因子变量,但其级别不包含NA
。
我们可以使用addNA
来修复此问题,然后使用factor
分配正确的标签,请注意不要排除我们刚刚创建的NA级别:
factor(
addNA(cut(as.numeric(LOC),c(0,50,200),right = FALSE)),
labels = c("NEAR","FAR","REF"),exclude=NULL)
# [1] NEAR NEAR NEAR NEAR NEAR NEAR NEAR NEAR NEAR FAR FAR FAR FAR FAR FAR FAR FAR FAR FAR FAR FAR REF REF REF REF REF REF
# Levels: NEAR FAR REF