我想将数字向量的每个元素与其bin的中点相关联,然后将其合并到k
等宽的bin中。
ggplot2::cut_interval
可以生成等宽的二进制位,Hmisc::cut2
可以提供中点,但我看不到两者兼顾的方法。
最小可重复的例子:
v <- c(1, 2, 7, 9)
# cut_interval gives equal-width bins, but no midpoints.
ggplot2::cut_interval(v, 2)
# [1] [1,5] [1,5] (5,9] (5,9]
# Levels: [1,5] (5,9]
# cut2 doesn't give equal-width bins.
Hmisc::cut2(v, g=2)
# [1] [1,7) [1,7) [7,9] [7,9]
# Levels: [1,7) [7,9]
# But it returns the midpoint.
Hmisc::cut2(v, g=2, levels.mean=T)
# [1] 1.5 1.5 8.0 8.0
# Levels: 1.5 8.0
# Which can be extracted as a numeric.
as.numeric(as.character(Hmisc::cut2(v, g=2, levels.mean=T)))
# [1] 1.5 1.5 8.0 8.0
答案 0 :(得分:0)
您可以从每个cut_interval
bin中提取上限和下限:
EqualWidthBinMidpoint <- function(x, k) {
# Returns midpoints of equal-width bins.
#
# Args:
# x: Vector to bin.
# k: Number of bins.
#
# Returns:
# Numeric vector with midpoint of each element of x's bin.
ci <- as.character(ggplot2::cut_interval(x, k))
ci2 <- substr(as.character(ci), 2, nchar(as.character(ci)) - 1)
lb <- sapply(ci2, function(x) strsplit(x, ",")[[1]][1])
ub <- sapply(ci2, function(x) strsplit(x, ",")[[1]][2])
return((as.numeric(lb) + as.numeric(ub)) / 2)
}
EqualWidthBinMidpoint(v, 2)
# [1] 3 3 7 7