获得等宽箱的中点

时间:2016-08-15 22:37:50

标签: r

我想将数字向量的每个元素与其bin的中点相关联,然后将其合并到k等宽的bin中。

ggplot2::cut_interval可以生成等宽的二进制位,Hmisc::cut2可以提供中点,但我看不到两者兼顾的方法。

最小可重复的例子:

v <- c(1, 2, 7, 9)

# cut_interval gives equal-width bins, but no midpoints.
ggplot2::cut_interval(v, 2)
# [1] [1,5] [1,5] (5,9] (5,9]
# Levels: [1,5] (5,9]

# cut2 doesn't give equal-width bins.
Hmisc::cut2(v, g=2)
# [1] [1,7) [1,7) [7,9] [7,9]
# Levels: [1,7) [7,9]

# But it returns the midpoint.
Hmisc::cut2(v, g=2, levels.mean=T)
# [1] 1.5 1.5 8.0 8.0
# Levels: 1.5 8.0

# Which can be extracted as a numeric.
as.numeric(as.character(Hmisc::cut2(v, g=2, levels.mean=T)))
# [1] 1.5 1.5 8.0 8.0

1 个答案:

答案 0 :(得分:0)

您可以从每个cut_interval bin中提取上限和下限:

EqualWidthBinMidpoint <- function(x, k) {
  # Returns midpoints of equal-width bins.
  #
  # Args:
  #   x: Vector to bin.
  #   k: Number of bins.
  #
  # Returns:
  #   Numeric vector with midpoint of each element of x's bin.
  ci <- as.character(ggplot2::cut_interval(x, k))
  ci2 <- substr(as.character(ci), 2, nchar(as.character(ci)) - 1)
  lb <- sapply(ci2, function(x) strsplit(x, ",")[[1]][1])
  ub <- sapply(ci2, function(x) strsplit(x, ",")[[1]][2])
  return((as.numeric(lb) + as.numeric(ub)) / 2)
}

EqualWidthBinMidpoint(v, 2)
# [1] 3 3 7 7