r - 根据下一行

时间:2017-07-11 07:23:41

标签: r

我正在尝试处理格式不正确的SAP数据。

在此源数据中,当其中一个变量(示例中的“配置文件”)包含更多条目时,它们将堆叠在一起。这会创建一个空的观察,在下一行中只有相同的“ID”。

因此:

ID  Status  Product     Profile   Description
154 NOCO    3000        A1        failure           
215 ATCO    4000                  dfect     
164 NOCO    2000        A1        dfect
164                     A2
875 ATCO    3000                  failure
548 NOCO    2000        A1        dfect         
548                     A2
548                     A3
797 NOCO    3000                  failure       
444 ATCO    4000                  failure       

我想要做的是移动这些堆叠值并将它们移动到下一列。

ID  Status  Product Profile  Profile2   Profile3    Description
154 NOCO    3000    A1                              failure
215 ATCO    4000                                    dfect
164 NOCO    2000    A1       A2                     dfect
875 ATCO    3000                                    failure
548 NOCO    2000    A1       A2         A3          dfect
797 NOCO    3000                                    failure
444 ATCO    4000                                    failure

我将如何做到这一点?

谢谢!

编辑:

添加了上面第一个表的dput:

structure(list(ID = c(154L, 215L, 164L, 164L, 875L, 548L, 548L, 
548L, 797L, 444L), Status = structure(c(3L, 2L, 3L, 1L, 2L, 3L, 
1L, 1L, 3L, 2L), .Label = c("", "ATCO", "NOCO"), class = "factor"), 
    Product = c(3000L, 4000L, 2000L, NA, 3000L, 2000L, NA, NA, 
    3000L, 4000L), Profile = structure(c(2L, 1L, 2L, 3L, 1L, 
    2L, 3L, 4L, 1L, 1L), .Label = c("", "A1", "A2", "A3"), class = "factor"), 
Description = structure(c(3L, 2L, 2L, 1L, 3L, 2L, 1L, 1L, 
3L, 3L), .Label = c("", "dfect", "failure"), class = "factor")), .Names = c("ID", 
"Status", "Product", "Profile", "Description"), class = "data.frame", row.names = c(NA, 
-10L))

2 个答案:

答案 0 :(得分:1)

您可以使用tidyr ...

执行此操作
require(tidyr)
df[df==""] <- NA #change your blanks to NAs
df2 <- df %>% fill(-ID) %>% #fill down missing values
              spread(key=Profile, value=Profile, sep="", fill="") #convert to wide format

df2
   ID Status Product Description ProfileA1 ProfileA2 ProfileA3
1 154   NOCO    3000     failure        A1                    
2 164   NOCO    2000       dfect        A1        A2          
3 215   ATCO    4000       dfect        A1                    
4 444   ATCO    4000     failure                            A3
5 548   NOCO    2000       dfect        A1        A2        A3
6 797   NOCO    3000     failure                            A3
7 875   ATCO    3000     failure                  A2          

答案 1 :(得分:0)

没有任何软件包的版本。但zoo / tidyr的答案更为优雅。

data = structure(list(ID = c(154L, 215L, 164L, 164L, 875L, 548L, 548L, 
                  548L, 797L, 444L), Status = structure(c(3L, 2L, 3L, 1L, 2L, 3L, 
                                                          1L, 1L, 3L, 2L), .Label = c("", "ATCO", "NOCO"), class = "factor"), 
           Product = c(3000L, 4000L, 2000L, NA, 3000L, 2000L, NA, NA, 
                       3000L, 4000L), Profile = structure(c(2L, 1L, 2L, 3L, 1L, 
                                                            2L, 3L, 4L, 1L, 1L), .Label = c("", "A1", "A2", "A3"), class = "factor"), 
           Description = structure(c(3L, 2L, 2L, 1L, 3L, 2L, 1L, 1L, 
                                     3L, 3L), .Label = c("", "dfect", "failure"), class = "factor")), .Names = c("ID", 
                                                                                                                 "Status", "Product", "Profile", "Description"), class = "data.frame", row.names = c(NA, 


new.data = data[,c("ID","Status","Product","Description")]
new.data = new.data[-which(new.data$Status==""),]
for(i in 1:3){
   new.data[[paste0("Profile",i)]] = NA
}
for(i in 1:3){
  for(id in new.data$ID){
    new.data[which(new.data$ID==id),paste0("Profile",i)] =
         ifelse(sum(data[which(data$ID==id),"Profile"]==
                paste0("A",i))>0,paste0("A",i),"")
  }
}

这会生成data.frame new.data

    ID Status Product Description Profile1 Profile2 Profile3
1  154   NOCO    3000     failure       A1                  
2  215   ATCO    4000       dfect                           
3  164   NOCO    2000       dfect       A1       A2         
5  875   ATCO    3000     failure                           
6  548   NOCO    2000       dfect       A1       A2       A3
9  797   NOCO    3000     failure                           
10 444   ATCO    4000     failure