Question

在我的群组（＆＃34;名称＆＃34;变量）中，我希望将值切换为四分位数。 为变量＆＃34;值＆＃34;创建一个四分位标签列。由于尺寸变化组，对于不同组更改的四分位数范围也是如此。代码下方但，只会将整数值减少四分位数，从而导致所有群组的相同四分位数 范围。

dt<-data.frame(name=c(rep('a',8),rep('b',4),rep('c',5)),value=c(1:8,1:4,1:5))
dt
dt.2<-dt%>% group_by(name)%>% mutate(newcol=
cut(value,breaks=quantile(value,probs=seq(0,1,0.25),na.rm=TRUE),include.lowest=TRUE))
dt.2
str(dt.2)

数据：

   name value
1     a     1
2     a     2
3     a     3
4     a     4
5     a     5
6     a     6
7     a     7
8     a     8
9     b     1
10    b     2
11    b     3
12    b     4
13    c     1
14    c     2
15    c     3
16    c     4
17    c     5

从上面的代码输出。更新：问题不在于newcol是因素，而是necol在所有不同组中具有相同的四分位数范围。例如，名称b，值为1-4，但四分位数范围为3-5，无论组如何，都从min（值）到max（value）得出。

 name value newcol
   <fctr> <int> <fctr>
1       a     1  [1,2]
2       a     2  [1,2]
3       a     3  (2,3]
4       a     4  (3,5]
5       a     5  (3,5]
6       a     6  (5,8]
7       a     7  (5,8]
8       a     8  (5,8]
9       b     1  [1,2]
10      b     2  [1,2]
11      b     3  (2,3]
12      b     4  (3,5]
13      c     1  [1,2]
14      c     2  [1,2]
15      c     3  (2,3]
16      c     4  (3,5]
17      c     5  (3,5]

期望的输出

   name value newcol/quartile label

1       a     1  1
2       a     2  1
3       a     3  2
4       a     4  2
5       a     5  3
6       a     6  3
7       a     7  4
8       a     8  4
9       b     1  1
10      b     2  2
11      b     3  3
12      b     4  4
13      c     1  1
14      c     2  2
15      c     3  3
16      c     4  4
17      c     5  4

Answer 1

这是按照split-apply-combine框架执行此操作的方法。

return $('span:contains(' + extracty.args.search + '), p:contains(' + extracty.args.search + ')').get().map(function(element) {
  return element.textContent.trim()
})

编辑：有dt<-data.frame(name=c(rep('a',8),rep('b',4),rep('c',5)),value=c(1:8,1:4,1:5)) split_dt <- lapply(split(dt, dt$name), transform, quantlabel = as.numeric( cut(value, breaks = quantile(value, probs = seq(0,1,.25)), include.lowest = T))) dt <- unsplit(split_dt, dt$name) name value quantlabel 1 a 1 1 2 a 2 1 3 a 3 2 4 a 4 2 5 a 5 3 6 a 6 3 7 a 7 4 8 a 8 4 9 b 1 1 10 b 2 2 11 b 3 3 12 b 4 4 13 c 1 1 14 c 2 1 15 c 3 2 16 c 4 3 17 c 5 4方式

关注this post，我们可以使用data.table包，如果性能受到关注：

data.table

编辑：并且有library(data.table) dt<-data.frame(name=c(rep('a',8),rep('b',4),rep('c',5)),value=c(1:8,1:4,1:5)) dt.t <- as.data.table(dt) dt.t[,quantlabels := as.numeric(cut(value, breaks = quantile(value, probs = seq(0,1,.25)), include.lowest = T)), name] name value quantlabels 1: a 1 1 2: a 2 1 3: a 3 2 4: a 4 2 5: a 5 3 6: a 6 3 7: a 7 4 8: a 8 4 9: b 1 1 10: b 2 2 11: b 3 3 12: b 4 4 13: c 1 1 14: c 2 1 15: c 3 2 16: c 4 3 17: c 5 4方式

我们可以关注@ akrun的建议并使用dplyr（这是我们为其他解决方案所做的）：

as.numeric

请注意，如果您想要标签本身，请使用dt %>% group_by(name) %>% mutate(quantlabel = as.numeric( cut(value, breaks = quantile(value, probs = seq(0,1,.25)), include.lowest = T)))：

as.character

按组划分分位数，具有不同的组大小

1 个答案: