我正在尝试处理格式不正确的SAP数据。
在此源数据中,当其中一个变量(示例中的“配置文件”)包含更多条目时,它们将堆叠在一起。这会创建一个空的观察,在下一行中只有相同的“ID”。
因此:
ID Status Product Profile Description
154 NOCO 3000 A1 failure
215 ATCO 4000 dfect
164 NOCO 2000 A1 dfect
164 A2
875 ATCO 3000 failure
548 NOCO 2000 A1 dfect
548 A2
548 A3
797 NOCO 3000 failure
444 ATCO 4000 failure
我想要做的是移动这些堆叠值并将它们移动到下一列。
ID Status Product Profile Profile2 Profile3 Description
154 NOCO 3000 A1 failure
215 ATCO 4000 dfect
164 NOCO 2000 A1 A2 dfect
875 ATCO 3000 failure
548 NOCO 2000 A1 A2 A3 dfect
797 NOCO 3000 failure
444 ATCO 4000 failure
我将如何做到这一点?
谢谢!
编辑:
添加了上面第一个表的dput:
structure(list(ID = c(154L, 215L, 164L, 164L, 875L, 548L, 548L,
548L, 797L, 444L), Status = structure(c(3L, 2L, 3L, 1L, 2L, 3L,
1L, 1L, 3L, 2L), .Label = c("", "ATCO", "NOCO"), class = "factor"),
Product = c(3000L, 4000L, 2000L, NA, 3000L, 2000L, NA, NA,
3000L, 4000L), Profile = structure(c(2L, 1L, 2L, 3L, 1L,
2L, 3L, 4L, 1L, 1L), .Label = c("", "A1", "A2", "A3"), class = "factor"),
Description = structure(c(3L, 2L, 2L, 1L, 3L, 2L, 1L, 1L,
3L, 3L), .Label = c("", "dfect", "failure"), class = "factor")), .Names = c("ID",
"Status", "Product", "Profile", "Description"), class = "data.frame", row.names = c(NA,
-10L))
答案 0 :(得分:1)
您可以使用tidyr
...
require(tidyr)
df[df==""] <- NA #change your blanks to NAs
df2 <- df %>% fill(-ID) %>% #fill down missing values
spread(key=Profile, value=Profile, sep="", fill="") #convert to wide format
df2
ID Status Product Description ProfileA1 ProfileA2 ProfileA3
1 154 NOCO 3000 failure A1
2 164 NOCO 2000 dfect A1 A2
3 215 ATCO 4000 dfect A1
4 444 ATCO 4000 failure A3
5 548 NOCO 2000 dfect A1 A2 A3
6 797 NOCO 3000 failure A3
7 875 ATCO 3000 failure A2
答案 1 :(得分:0)
没有任何软件包的版本。但zoo
/ tidyr
的答案更为优雅。
data = structure(list(ID = c(154L, 215L, 164L, 164L, 875L, 548L, 548L,
548L, 797L, 444L), Status = structure(c(3L, 2L, 3L, 1L, 2L, 3L,
1L, 1L, 3L, 2L), .Label = c("", "ATCO", "NOCO"), class = "factor"),
Product = c(3000L, 4000L, 2000L, NA, 3000L, 2000L, NA, NA,
3000L, 4000L), Profile = structure(c(2L, 1L, 2L, 3L, 1L,
2L, 3L, 4L, 1L, 1L), .Label = c("", "A1", "A2", "A3"), class = "factor"),
Description = structure(c(3L, 2L, 2L, 1L, 3L, 2L, 1L, 1L,
3L, 3L), .Label = c("", "dfect", "failure"), class = "factor")), .Names = c("ID",
"Status", "Product", "Profile", "Description"), class = "data.frame", row.names = c(NA,
new.data = data[,c("ID","Status","Product","Description")]
new.data = new.data[-which(new.data$Status==""),]
for(i in 1:3){
new.data[[paste0("Profile",i)]] = NA
}
for(i in 1:3){
for(id in new.data$ID){
new.data[which(new.data$ID==id),paste0("Profile",i)] =
ifelse(sum(data[which(data$ID==id),"Profile"]==
paste0("A",i))>0,paste0("A",i),"")
}
}
这会生成data.frame new.data
:
ID Status Product Description Profile1 Profile2 Profile3
1 154 NOCO 3000 failure A1
2 215 ATCO 4000 dfect
3 164 NOCO 2000 dfect A1 A2
5 875 ATCO 3000 failure
6 548 NOCO 2000 dfect A1 A2 A3
9 797 NOCO 3000 failure
10 444 ATCO 4000 failure