R:使用'传播'进行透视。函数

时间:2015-04-29 19:11:04

标签: r dataframe pivot melt tidyr

继续我之前的post,我现在还需要使用另外一列ID值来将行转移到列中。

    NUM <- c(1,2,3,1,2,3,1,2,3,1)
    ID <- c("DJ45","DJ45","DJ45","DJ46","DJ46","DJ46","DJ47","DJ47","DJ47","DJ48")
    Type <- c("A", "F", "C", "B", "D", "A", "E", "C", "F", "D")
    Points <- c(9.2,60.8,22.9,1012.7,18.7,11.1,67.2,63.1,16.7,58.4)

    df1 <- data.frame(ID,NUM,Type,Points)

df1:
    +------+-----+------+--------+
    | ID   | Num | Type | Points |
    +------+-----+------+--------+
    | DJ45 |   1 | A    | 9.2    |
    | DJ45 |   2 | F    | 60.8   |
    | DJ45 |   3 | C    | 22.9   |
    | DJ46 |   1 | B    | 1012.7 |
    | DJ46 |   2 | D    | 18.7   |
    | DJ46 |   3 | A    | 11.1   |
    | DJ47 |   1 | E    | 67.2   |
    | DJ47 |   2 | C    | 63.1   |
    | DJ47 |   3 | F    | 16.7   |
    | DJ48 |   1 | D    | 58.4   |
    +------+-----+------+--------+

我想要的输出是

+------+-----+------+--------+------+------+------+------+
| ID   | Num |  A   |   B    |  C   |  D   |  E   |  F   |
+------+-----+------+--------+------+------+------+------+
| DJ45 |   1 | 9.2  | N/A    | N/A  | N/A  | N/A  | N/A  |
| DJ45 |   2 | N/A  | N/A    | N/A  | N/A  | N/A  | 60.8 |
| DJ45 |   3 | N/A  | N/A    | 22.9 | N/A  | N/A  | N/A  |
| DJ46 |   1 | N/A  | 1012.7 | N/A  | N/A  | N/A  | N/A  |
| DJ46 |   2 | N/A  | N/A    | N/A  | 18.7 | N/A  | N/A  |
| DJ46 |   3 | 11.1 | N/A    | N/A  | N/A  | N/A  | N/A  |
| DJ47 |   1 | N/A  | N/A    | N/A  | N/A  | 67.2 | N/A  |
| DJ47 |   2 | N/A  | N/A    | 63.1 | N/A  | N/A  | N/A  |
| DJ47 |   3 | N/A  | N/A    | N/A  | N/A  | N/A  | 16.7 |
| DJ48 |   1 | N/A  | N/A    | N/A  | 58.4 | N/A  | N/A  |
+------+-----+------+--------+------+------+------+------+

我在R中使用spread函数但是收到错误的重复标识符。这是因为我现在有2列(ID&amp; NUM)而不是之前的一列(NUM)。请让我知道如何做到这一点。

1 个答案:

答案 0 :(得分:4)

不知道你尝试了什么,我建议:

spread(df1, Type, Points)
#      ID NUM    A      B    C    D    E    F
# 1  DJ45   1  9.2     NA   NA   NA   NA   NA
# 2  DJ45   2   NA     NA   NA   NA   NA 60.8
# 3  DJ45   3   NA     NA 22.9   NA   NA   NA
# 4  DJ46   1   NA 1012.7   NA   NA   NA   NA
# 5  DJ46   2   NA     NA   NA 18.7   NA   NA
# 6  DJ46   3 11.1     NA   NA   NA   NA   NA
# 7  DJ47   1   NA     NA   NA   NA 67.2   NA
# 8  DJ47   2   NA     NA 63.1   NA   NA   NA
# 9  DJ47   3   NA     NA   NA   NA   NA 16.7
# 10 DJ48   1   NA     NA   NA 58.4   NA   NA

如果您收到有关重复标识符的错误,那是因为&#34; ID&#34;和&#34; Num&#34;在您的实际数据中有一个或多个重复的条目(在您的示例数据中,他们不会)。

如果是这种情况,您需要添加另一列以使其唯一。

dplyr添加到链中,可能类似于:

df1 %>%
  group_by(ID, NUM) %>%
  mutate(id2 = sequence(n())) %>%
  spread(Type, Points)

演示假设错误:

df2 <- rbind(df1, df1[1:3, ]) ## Duplicate the first three rows
spread(df2, Type, Points)
# Error: Duplicate identifiers for rows (1, 11), (3, 13), (2, 12)    

library(dplyr)

df2 %>%
  group_by(ID, NUM) %>%
  mutate(id2 = sequence(n())) %>%
  spread(Type, Points)
# Source: local data frame [13 x 9]
# 
#      ID NUM id2    A      B    C    D    E    F
# 1  DJ45   1   1  9.2     NA   NA   NA   NA   NA
# 2  DJ45   1   2  9.2     NA   NA   NA   NA   NA
# 3  DJ45   2   1   NA     NA   NA   NA   NA 60.8
# 4  DJ45   2   2   NA     NA   NA   NA   NA 60.8
# 5  DJ45   3   1   NA     NA 22.9   NA   NA   NA
# 6  DJ45   3   2   NA     NA 22.9   NA   NA   NA
# 7  DJ46   1   1   NA 1012.7   NA   NA   NA   NA
# 8  DJ46   2   1   NA     NA   NA 18.7   NA   NA
# 9  DJ46   3   1 11.1     NA   NA   NA   NA   NA
# 10 DJ47   1   1   NA     NA   NA   NA 67.2   NA
# 11 DJ47   2   1   NA     NA 63.1   NA   NA   NA
# 12 DJ47   3   1   NA     NA   NA   NA   NA 16.7
# 13 DJ48   1   1   NA     NA   NA 58.4   NA   NA