我需要从R中的以下数据框中为每个组[yearmonth]值选择前两个值。我已经按照count和yearmonth对数据进行了排序。如何在以下数据中实现这一点?
yearmonth name count
1 201310 Dovas 5
2 201310 Indulgd 2
3 201310 Justina 1
4 201310 Jolita 1
5 201311 Shahrukh Sheikh 1
6 201311 Dovas 29
7 201311 Justina 13
8 201311 Lina 8
9 201312 sUPERED 7
10 201312 John Hansen 7
11 201312 Lina D. 6
12 201312 joanna1st 5
答案 0 :(得分:7)
或使用来自@ jazzurro帖子的data.table
(mydf
)。一些选项是
library(data.table)
setDT(mydf)[order(yearmonth,-count), .SD[1:2], by=yearmonth]
或者
setDT(mydf)[mydf[order(yearmonth, -count), .I[1:2], by=yearmonth]$V1,]
或者
setorder(setkey(setDT(mydf), yearmonth), yearmonth, -count)[
,.SD[1:2], by=yearmonth]
# yearmonth name count
#1: 201310 Dovas 5
#2: 201310 Indulgd 2
#3: 201311 Dovas 29
#4: 201311 Justina 13
#5: 201312 sUPERED 7
#6: 201312 John Hansen 7
答案 1 :(得分:4)
这是一种方式:
library(dplyr)
mydf %>%
group_by(yearmonth) %>%
arrange(desc(count)) %>%
slice(1:2)
# yearmonth name count
#1 201310 Dovas 5
#2 201310 Indulgd 2
#3 201311 Dovas 29
#4 201311 Justina 13
#5 201312 sUPERED 7
#6 201312 John Hansen 7
数据强>
mydf <- data.frame(yearmonth = rep(c("201310", "201311", "201312"), each = 4),
name = c("Dovas", "Indulgd", "Justina", "Jolita", "Shahrukh Sheikh",
"Dovas", "Justina", "Lina", "sUPERED", "John Hansen",
"Lina D.", "joanna1st"),
count = c(5,2,1,1,1,29,13,8,7,7,6,5),
stringsAsFactors = FALSE)
答案 2 :(得分:1)
使用base R你可以做类似的事情:
# sort the data, skip if already done
df <- df[order(df$yearmonth, df$count, decreasing = TRUE),]
然后,获得前两个元素:
df[ave(df$count, df$yearmonth, FUN = seq_along) <= 2, ]