我收集的数据如下:
A B C D E F G
1 1 0 0 0 0 0 0
1,2 0 1 0 0 0 0 2
1,2,3 0 0 0 0 0 0 0
1,3 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
2,3 4 0 0 0 5 0 0
3 1 3 0 0 0 2 0
4 0 0 0 0 0 0 0
对于每种颜色(A,B,C,D,E,F,G),它根据样品同时对应一个或多个类别(1,2,3,4)。对于许多类别,有逗号分隔。
我想简化我的数据,如下所示:
A B C D E F G
1 1 1 0 0 0 0 2
3 4 0 0 0 5 2 0
2 4 1 0 0 5 0 2
4 0 0 0 0 0 0 0
有一种简单的方法(功能)吗?
可重复的例子:
DF <- read.table(text = " Color Cat
A 1
B 1
C 4,2
D 1,3
E 1,2
F 3
G 5
A 2
B 3
C 1,2
D 4,3
E 3
F 1
G 1" , header = TRUE)
DF = table(DF$Cat,DF$Color)
cats <- strsplit(rownames(DF), ",", fixed = TRUE)
DF <- DF[rep(seq_len(nrow(DF)), sapply(cats, length)),]
DF$cat <- unlist(cats)
DF <- aggregate(. ~ cat, DF, FUN = sum)
答案 0 :(得分:1)
DF <- read.table(text = " A B C D E F G
1 1 0 0 0 0 0 0
1,2 0 1 0 0 0 0 2
1,2,3 0 0 0 0 0 0 0
1,3 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
2,3 4 0 0 0 5 0 0
3 1 3 0 0 0 2 0
4 0 0 0 0 0 0 0", header = TRUE)
#split the row names
cats <- strsplit(rownames(DF), ",", fixed = TRUE)
#repeat each row of the DF times the number of cats
DF <- DF[rep(seq_len(nrow(DF)), sapply(cats, length)),]
#add column with cats
DF$cat <- unlist(cats)
#aggregate (your question is unclear regarding how)
DF <- aggregate(. ~ cat, DF, FUN = sum) #or FUN = max???
# cat A B C D E F G
#1 1 1 1 0 0 0 0 2
#2 2 4 1 0 0 5 0 2
#3 3 5 3 0 0 5 2 0
#4 4 0 0 0 0 0 0 0