分组依据,汇总以根据条件创建新列

时间:2019-04-12 13:52:54

标签: r

这是数据

DPS Comodity    Std Issue
111 Hard drive  No Post
111 MBD         NoBoot
111 LCD         Flicker
222 MBD         No Post
222 LCD         No Post
333 MBD         No power

我必须使用以下格式

DPS Comodity            Std Issue
111 Hard drive,MBD,LCD  Hard drive-No Post,MBD-NoBoot,LCD-Flicker
222 MBD,LCD                 No Post
333 MBD                 No Power

我尝试过aggregate(Std Issue~DPS,df,function(x)toString(uniqe(x))),但结果是Std Issue为

No Post,No Boot, Flicker
No Post
No Power

这不是我所要求的,关于解决此类问题的任何建议将非常有帮助和赞赏。

aggregate(Std Issue~DPS,df,function(x)toString(uniqe(x)))

这是预期的结果

DPS Comodity            Std Issue
111 Hard drive,MBD,LCD  Hard drive-No Post,MBD-NoBoot,LCD-Flicker
222 MBD,LCD                 No Post
333 MBD                 No Power

3 个答案:

答案 0 :(得分:1)

您可以使用data.table软件包-

  > library(data.table)
  > setDT(dt)[,Std_Issue:=paste0(Comodity,"-",Std.Issue)]
  > setDT(dt)[, list(Comodity = paste(Comodity, collapse=","),
             `Std Issue` = paste(Std_Issue, collapse=",")), by = DPS]

输出-

DPS           Comodity                                 Std Issue
1: 111 Hard drive,MBD,LCD     Hard drive-No Post,MBD-NoBoot,LCD-Flicker
2: 222            MBD,LCD                   MBD-No Post,LCD-No Post
3: 333                MBD                              MBD-No power

输入数据-

dt <- read.table(text="DPS  Comodity    Std Issue
111 Hard drive  No Post
                 111    MBD NoBoot
                 111    LCD Flicker
                 222    MBD No Post
                 222    LCD No Post
                 333    MBD No power",header=T,sep="\t")

已编辑

您可以使用for loop-

> setDT(dt)[,Std_Issue:=paste0(Comodity,"-",Std.Issue)]
> setDT(dt)[, list(Std_issue = ifelse(length(unlist(unique(lapply(str_split(Std_Issue,"-"),function(x)x[2]))))<3,paste(unique(`Std.Issue`), collapse=","),paste(Std_Issue, collapse=",")),Commodity=paste(Comodity, collapse=",")), by=DPS]

   DPS                            Std_issue                  Commodity
1: 111       Hard drive-No Post,MBD-NoBoot,LCD-Flicker   Hard drive,MBD,LCD
2: 222                              No Post                   MBD,LCD
3: 333                              No power                    MBD

答案 1 :(得分:0)

我们可以使用dplyr应用于两个列,即

library(dplyr)
df %>% 
 group_by(DPS) %>% 
 summarise_all(funs(toString(unique(.))))

给出,

# A tibble: 3 x 3
    DPS Comodity             Std_Issue               
  <int> <chr>                <chr>                   
1   111 Hard_drive, MBD, LCD No_Post, NoBoot, Flicker
2   222 MBD, LCD             No_Post                 
3   333 MBD                  No_power

答案 2 :(得分:0)

最后,我找到了可行的解决方案:

test_df <- data.frame(DPS=c(111,111,111,222,222,333),comodity =c("HDD","MBD","LCD","MBD","LCD","MBD"),stdIss=c("No Post","No Boot","Flicker","No Post","No Post","No Power"))
A <- data.frame(tapply(test_df$comodity,test_df$DPS,FUN = function(x){toString(x)}))
B <- data.frame(tapply(test_df$stdIss,test_df$DPS,FUN=function(x{toString(unique(x))}))
C <- data.frame(A,B)
colnames(C)[1] <- "comodity"
colnames(C)[2] <- "Std Issue"
C$comodity <- strsplit(C$comodity, split = ",")

C$`Std Issue` <- strsplit(C$`Std Issue`,split = ",")
C$new <- NA

D <- list()

for(i in 1:nrow(C)){

   if(length(C$`Std Issue`[[i]])>1){for(j in 1:length(C$`Std Issue`[[i]]))
     {
       D[j]<- paste(C$comodity[[i]][j],C$`Std Issue`[[i]][j],sep = "-")
     }
       C$new[i]<-paste(D,collapse = ",")

     }
    else 
     { 
       C$new[i] <-paste(C$`Std Issue`[i])
     }
}