假设,我有一个如下数据框 DF1:
+------+--+------+--------+
| ID | | Type | Points |
+------+--+------+--------+
| DJ45 | | A | 69.2 |
| DJ45 | | F | 60.8 |
| DJ45 | | C | 2.9 |
| DJ46 | | B | 22.7 |
| DJ46 | | D | 18.7 |
| DJ46 | | A | 16.1 |
| DJ47 | | E | 67.2 |
| DJ47 | | C | 63.1 |
| DJ47 | | F | 16.7 |
| DJ48 | | D | 8.4 |
+------+--+------+------+
我希望获得一个结果,该结果将以下列格式提供类型的前2值(逐点):
输出:
+------+---------+---------+
| ID | Type1 | Type2 |
+------+---------+---------+
| DJ45 | A | F |
| DJ46 | B | D |
| DJ47 | E | C |
| DJ48 | D | NA |
我用过:
df1 %>%
group_by(Id) %>%
top_n(2,wt=Points) %>%
mutate(val = paste("Type", row_number())) %>%
filter(row_number()<=2) %>%
select(-Points) %>%
spread(val, Type)
但我得到以下答案:
输出:
+------+------+--------+---------+
| ID |Points|Type1 | Type2 |
+------+------+--------+---------+
| DJ45 | 69.2 | A | NA |
| DJ45 | 60.8 | NA | F |
| DJ46 | 22.7 | B | NA |
| DJ46 | 18.7 | NA | D |
| DJ47 | 67.2 | E | NA |
| DJ47 | 63.1 | NA | C |
| DJ48 | 8.4 | D | NA |
答案 0 :(得分:2)
df <- read.table(header = T, stringsAsFactors = F, text = "
ID Type Points
DJ45 A 69.2
DJ45 F 60.8
DJ45 C 2.9
DJ46 B 22.7
DJ46 D 18.7
DJ46 A 16.1
DJ47 E 67.2
DJ47 C 63.1
DJ47 F 16.7
DJ48 D 8.4
")
library(dplyr)
library(tidyr)
df %>%
group_by(ID) %>%
top_n(2, wt = Points) %>%
arrange(-Points) %>%
mutate(Points = paste0('Type', row_number())) %>%
spread(Points, Type)
top_n(2, wt = Points)
根据Points
在ID组内过滤前两行arrange(-Points)
按降序排列mutate(Points = paste0('Type', row_number()))
修改Points
等于'类型'+ ID组中的行号(1到2)spread(Points, Type)
为Points
中的每个唯一值创建列,并在其中放置Type
的相应值