如何使用spread()来获得所需的输出

时间:2017-05-16 07:38:06

标签: r dplyr tidyr tidyverse spread

假设,我有一个如下数据框 DF1:

+------+--+------+--------+
| ID   |  | Type | Points |
+------+--+------+--------+
| DJ45 |  | A    | 69.2 |
| DJ45 |  | F    | 60.8 |
| DJ45 |  | C    |  2.9 |
| DJ46 |  | B    | 22.7 |
| DJ46 |  | D    | 18.7 |
| DJ46 |  | A    | 16.1 |
| DJ47 |  | E    | 67.2 |
| DJ47 |  | C    | 63.1 |
| DJ47 |  | F    | 16.7 |
| DJ48 |  | D    |  8.4 |
+------+--+------+------+

我希望获得一个结果,该结果将以下列格式提供类型的前2值(逐点):

输出:

+------+---------+---------+
| ID   | Type1   | Type2   |
+------+---------+---------+
| DJ45 |   A     | F       | 
| DJ46 |   B     | D       | 
| DJ47 |   E     | C       | 
| DJ48 |   D     | NA      | 

我用过:

df1 %>%
  group_by(Id) %>%
  top_n(2,wt=Points) %>%
  mutate(val = paste("Type", row_number())) %>% 
  filter(row_number()<=2) %>% 
  select(-Points) %>% 
  spread(val, Type)

但我得到以下答案:

输出:

+------+------+--------+---------+
| ID   |Points|Type1   | Type2   |
+------+------+--------+---------+
| DJ45 | 69.2 |  A     | NA      | 
| DJ45 | 60.8 |  NA    | F       | 
| DJ46 | 22.7 |  B     | NA      | 
| DJ46 | 18.7 |  NA    | D       | 
| DJ47 | 67.2 |  E     | NA      | 
| DJ47 | 63.1 |  NA    | C       |
| DJ48 |  8.4 |  D     | NA      |

1 个答案:

答案 0 :(得分:2)

df <- read.table(header = T, stringsAsFactors = F, text = "
ID Type Points
DJ45 A 69.2
DJ45 F 60.8
DJ45 C 2.9
DJ46 B 22.7
DJ46 D 18.7
DJ46 A 16.1
DJ47 E 67.2
DJ47 C 63.1
DJ47 F 16.7
DJ48 D 8.4
")

library(dplyr)
library(tidyr)

df %>%
  group_by(ID) %>%
  top_n(2, wt = Points) %>%
  arrange(-Points) %>% 
  mutate(Points = paste0('Type', row_number())) %>% 
  spread(Points, Type)
  • top_n(2, wt = Points)根据Points在ID组内过滤前两行
  • arrange(-Points)按降序排列
  • mutate(Points = paste0('Type', row_number()))修改Points等于'类型'+ ID组中的行号(1到2)
  • spread(Points, Type)Points中的每个唯一值创建列,并在其中放置Type的相应值