根据各列中提到的集合计数重复行

时间:2018-08-29 07:24:13

标签: r

如何在R中相应列(考虑多个列)中提到的每个计数重复行?

data <- data.frame(
 city=c("A","B","C","D","E","F","G"),
 score=c(83,94,1,21,2,3,0),
 J=c(2,0,1,0,3,0,0),
 K=c(0,2,0,3,0,1,0),
 L=c(1,1,0,4,0,0,0))
data

原始数据框

enter image description here

必需的数据框

enter image description here

考虑所有列数,PS重复4次的城市D,其中城市k的k列中的3行的计数为1,而列L的4行的计数为1。

3 个答案:

答案 0 :(得分:4)

另一个data.table解决方案:

library(data.table)
setDT(data)
data[, lapply(.SD, function(x){
    g <- pmax(max(unlist(.SD)), 1)
    rep(1:0, c(x, g - x)) }), by = .(city, score)]

#     city score number number2 number3
#  1:    A    83      1       0       1
#  2:    A    83      1       0       0
#  3:    B    94      0       1       1
#  4:    B    94      0       1       0
#  5:    C     1      1       0       0
#  6:    D    21      0       1       1
#  7:    D    21      0       1       1
#  8:    D    21      0       1       1
#  9:    D    21      0       0       1
# 10:    E     2      1       0       0
# 11:    E     2      1       0       0
# 12:    E     2      1       0       0
# 13:    F     3      0       1       0
# 14:    G     0      0       0       0

所有数字均等于零的行将得到正确处理。如果您不希望这样的行,请用g <- pmax(max(unlist(.SD)), 1)替换g <- max(unlist(.SD))

data[, lapply(.SD, function(x){
    g <- max(unlist(.SD))
    rep(1:0, c(x, g - x)) }), by = .(city, score)]

答案 1 :(得分:2)

data.table解决方案:

数据:(确保您没有stringsAsFactors = F的因素)

data <- data.frame(
    city=c("A","B","C","D","E","F","G"),
    score=c(83,94,1,21,2,3,0),
    number=c(2,0,1,0,3,0,0),
    number2=c(0,2,0,3,0,1,0),
    number3=c(1,1,0,4,0,0,0),stringsAsFactors = F)

代码:(让我们有一个功能fun1为我们工作)

data.table::setDT(data)

fun1 <- function(x) {
    transpose(
        transpose(
            lapply(x, function(u) if(u != 0) rep(1,u) else 0), fill = 0
        )
    )
}

data[, structure(fun1(.SD), .Names = names(.SD)), by = c("city","score")]

结果:

 #   city score number number2 number3
 #1:    A    83      1       0       1
 #2:    A    83      1       0       0
 #3:    B    94      0       1       1
 #4:    B    94      0       1       0
 #5:    C     1      1       0       0
 #6:    D    21      0       1       1
 #7:    D    21      0       1       1
 #8:    D    21      0       1       1
 #9:    D    21      0       0       1
#10:    E     2      1       0       0
#11:    E     2      1       0       0
#12:    E     2      1       0       0
#13:    F     3      0       1       0
#14:    G     0      0       0       0

答案 2 :(得分:1)

请注意,根据您提供的示例数据,预期输出中会有一些错误(请参阅@markus注释)。

这是一个使用tidyverse的{​​{1}}选项

splitstackshape::cSplit

说明:我们的想法是将每个library(splitstackshape) library(tidyverse) data %>% rowwise() %>% mutate_at(vars(starts_with("number")), funs(toString(rep(1, .)))) %>% group_by(city) %>% cSplit(grep("^number", names(data), value = T), direction = "long") %>% filter_at(vars(starts_with("number")), any_vars(!is.na(.))) %>% replace(., is.na(.), 0) # city score number number2 number3 #1 A 83 1 0 1 #2 A 83 1 0 0 #3 B 94 0 1 1 #4 B 94 0 1 0 #5 C 1 1 0 0 #6 D 21 0 1 1 #7 D 21 0 1 1 #8 D 21 0 1 1 #9 D 21 0 0 1 #10 E 2 1 0 0 #11 E 2 1 0 0 #12 E 2 1 0 0 #13 F 3 0 1 0 条目替换为与其值相对应的number数量的vector,然后我们将其转换为逗号分隔的{ {1}}个向量与1。然后,我们使用character将这些条目分成多行,删除所有toString行,并用splitstackshape::cSplit s替换NA s。