矩阵

时间:2016-02-25 10:02:59

标签: r

我收集的数据如下:

            A       B          C       D      E        F                 G

  1         1       0          0       0      0        0                 0
  1,2       0       1          0       0      0        0                 2
  1,2,3     0       0          0       0      0        0                 0
  1,3       0       0          0       0      0        0                 0
  2         0       0          0       0      0        0                 0
  2,3       4       0          0       0      5        0                 0
  3         1       3          0       0      0        2                 0
  4         0       0          0       0      0        0                 0

对于每种颜色(A,B,C,D,E,F,G),它根据样品同时对应一个或多个类别(1,2,3,4)。对于许多类别,有逗号分隔。

我想简化我的数据,如下所示:

            A       B          C       D      E        F                 G

  1         1       1          0       0      0        0                 2
  3         4       0          0       0      5        2                 0
  2         4       1          0       0      5        0                 2
  4         0       0          0       0      0        0                 0

有一种简单的方法(功能)吗?

可重复的例子:

DF <- read.table(text = " Color         Cat

A             1
B             1   
C             4,2
D             1,3
E             1,2
F             3
G             5
A             2
B             3   
C             1,2
D             4,3
E             3
F             1
G             1" , header = TRUE)

DF = table(DF$Cat,DF$Color)
cats <- strsplit(rownames(DF), ",", fixed = TRUE)
DF <- DF[rep(seq_len(nrow(DF)), sapply(cats, length)),]
DF$cat <- unlist(cats)
DF <- aggregate(. ~ cat, DF, FUN = sum)

1 个答案:

答案 0 :(得分:1)

DF <- read.table(text = "            A       B          C       D      E        F                 G
                 1         1       0          0       0      0        0                 0
                 1,2       0       1          0       0      0        0                 2
                 1,2,3     0       0          0       0      0        0                 0
                 1,3       0       0          0       0      0        0                 0
                 2         0       0          0       0      0        0                 0
                 2,3       4       0          0       0      5        0                 0
                 3         1       3          0       0      0        2                 0
                 4         0       0          0       0      0        0                 0", header = TRUE)

#split the row names
cats <- strsplit(rownames(DF), ",", fixed = TRUE)
#repeat each row of the DF times the number of cats
DF <- DF[rep(seq_len(nrow(DF)), sapply(cats, length)),]
#add column with cats
DF$cat <- unlist(cats)
#aggregate (your question is unclear regarding how)
DF <- aggregate(. ~ cat, DF, FUN = sum) #or FUN = max???
#  cat A B C D E F G
#1   1 1 1 0 0 0 0 2
#2   2 4 1 0 0 5 0 2
#3   3 5 3 0 0 5 2 0
#4   4 0 0 0 0 0 0 0