I have a data frame of many companies (let's say 7 companies) and many periods (let's say 2 periods). I need to create a new column by dividing each period's company into few parts (let's say 3 parts). Now since 7 can not exactly be divided by 3, I want assign two rows to each of the first groups, and one extra row to the last group. In the following table, the 'res' column is the expected result:
Company Period res
1 1 11
2 1 11
3 1 12
4 1 12
5 1 13
6 1 13
7 1 13
1 2 21
2 2 21
3 2 22
4 2 22
5 2 23
6 2 23
7 2 23
答案 0 :(得分:0)
据我了解,你想要分成相等的部分,并把剩下的(如果有剩余部分)放在最后一组中。以下功能就是这样,即
f1 <- function(x, parts){
len1 <- length(x)
i1 <- len1 %% parts
v1 <- rep((len1 - i1)/parts, parts)
v1[length(v1)] <- v1[length(v1)] + i1
v2 <- rep(seq_along(v1), v1)
return(v2)
}
#Here are some trials,
f1(seq(7), 3)
#[1] 1 1 2 2 3 3 3
f1(seq(8), 3)
#[1] 1 1 2 2 3 3 3 3
f1(seq(9), 3)
#[1] 1 1 1 2 2 2 3 3 3
f1(seq(10), 3)
#[1] 1 1 1 2 2 2 3 3 3 3
现在你需要使用split-apply方法在每个组中应用它(使用data.table
或dplyr
肯定会加速这个过程),即
do.call(rbind,
lapply(split(df, df$Period), function(i) {
i$New_column <- paste0(i$Period, f1(i$Company, 3)); i}))
给出,
Company Period New_column 1.1 1 1 11 1.2 2 1 11 1.3 3 1 12 1.4 4 1 12 1.5 5 1 13 1.6 6 1 13 1.7 7 1 13 2.8 1 2 21 2.9 2 2 21 2.10 3 2 22 2.11 4 2 22 2.12 5 2 23 2.13 6 2 23 2.14 7 2 23
注意:您可以在paste0
中轻松添加分隔符,以区分1_11
和11_1
答案 1 :(得分:0)
创建公司数量(nc
)和组数(nc
)的函数。对于除最后一组(ng - 1
)之外的所有组,每组的长度为商(nc %/% ng
)。对于最后一组,长度是商加上余数(nc %% ng
)。
f <- function(nc, ng){
qu <- nc %/% ng
rep(1:ng, c(rep(qu, ng - 1), qu + nc %% ng))
}
每个时期都这样做:
d$res2 <- ave(d$Period, d$Period, FUN = function(x) paste0(x, "_", f(7, 3)))
d
# Company Period res res2
# 1 1 1 11 1_1
# 2 2 1 11 1_1
# 3 3 1 12 1_2
# 4 4 1 12 1_2
# 5 5 1 13 1_3
# 6 6 1 13 1_3
# 7 7 1 13 1_3
# 8 1 2 21 2_1
# 9 2 2 21 2_1
# 10 3 2 22 2_2
# 11 4 2 22 2_2
# 12 5 2 23 2_3
# 13 6 2 23 2_3
# 14 7 2 23 2_3
这里公司的数量是硬编码的(7
),但这当然可以从您的数据中计算出来。
如果余下的 分配给最后一个组,则可以使用cut
:
ave(d$Company, d$Period, FUN = function(x) cut(seq_along(x), 3))