通过使用break&amp ;;在数据帧中动态创建bin。分位数失败了吗?

时间:2013-09-27 15:16:51

标签: r dataframe binning

编辑: 我在我之前分享的代码中犯了一个错误。我用“b”替换了“箱子”,但错过了一个......

我现在也使用正确的data.frame(y代替原来的df.score)

新代码:

# some data
x <- runif(1000)
x2 <- rnorm(1000)
y <- data.frame(x,x2)
# we want to bin the dataframe y acording to values in x into b bins
b = 10
bins=10

# we create breaks in several ways
breaks=unique(quantile(x, probs=seq.int(0,1, by=1/b)))
breaks=unique(quantile(y$x, probs=seq.int(0,1, length.out=b+1)))

# now to the question
# this wokrs
y$b <- with(y, cut(x, breaks=unique(quantile(x, probs=seq.int(0,1, length.out=11))), include.lowest=TRUE))
table(y$b)
# this works too
y$b2 <- with(y, cut(x, breaks=unique(quantile(x, probs=seq.int(0,1, length.out=(bins+1)))), include.lowest=TRUE))
table(y$b2)
# this does not work
y$b3 <- with(y, cut(x, breaks=unique(quantile(x, probs=seq.int(0,1, length.out=(b+1)))), include.lowest=TRUE))

seq.int中的错误(0,1,length.out =(b + 1)):   'length.out'必须是非负数 另外:警告信息: 在Ops.factor(b,1)中:+对因子无意义

现在,如果我将代码分开,则没有问题!!!

brks=unique(quantile(x, probs=seq.int(0,1, length.out=(b + 1))))
y$b3 <- with(y, cut(x, breaks=brks, include.lowest=TRUE))

我迷失在这里......

这是更动态的代码的一部分,根据数据集中的细节进行编织。

所以我想动态创建垃圾箱并报告它们。代码现在可以工作,但是我不明白为什么当我使用“bins”这个词时代码可以工作,当使用“b”时代码失败了......?


从这里老了 我需要动态地将bin添加到数据帧中,以便稍后报告它们。

# some data
x <- runif(1000)
x2 <- rnorm(1000)
y <- data.frame(x,x2)
# we want to bin the dataframe y acording to values in x into b bins
b = 10

# we create breaks in several ways
breaks=unique(quantile(x, probs=seq.int(0,1, by=1/b)))
breaks=unique(quantile(y$x, probs=seq.int(0,1, length.out=b+1)))

# now to question
# this works

y$bins <- with(df.score, cut(x, breaks=unique(quantile(Pchurn, probs=seq.int(0,1, length.out=11))), include.lowest=TRUE))
table(y$bins)

因此,如果我想直接使用bin var完成相同的操作,则会失败:

# this does not work
y$bins <- with(df.score, cut(x, breaks=unique(quantile(Pchurn, probs=seq.int(0,1, length.out=bins+1))), include.lowest=TRUE))


Error in seq.int(0, 1, length.out = (bins + 1)) : 
  'length.out' must be a non-negative number
In addition: Warning message:
In Ops.factor(bins, 1) : + not meaningful for factors

我在这里缺少什么?

1 个答案:

答案 0 :(得分:2)

我想你想要这个(在长度参数calc中替换b代替bins“#this not not”:

y$bins <- with(df.score, cut(x, 
                    breaks=unique(quantile(Pchurn, 
                                         probs=seq.int(0,1, length.out=b+1))), 
                    include.lowest=TRUE))

很难测试没有得分变量和更完整的目标描述,但至少代码不会在工作区中引发错误。

 df.score=data.frame(Pchurn=rnorm(100), x=rnorm(100))