道歉,如果这个问题有点罗嗦,但我相信这里的一个例子可以解决问题。我有以下数据框:
structure(list(teamName = c("Brazil", "Germany", "Spain", "England",
"France", "Spain", "France", "Germany", "Brazil", "England",
"Spain", "France", "Brazil"), wins = c(0, 0, 0, 0, 0, 1, 1, 1,
1, 1, 1, 2, 1), losses = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1), ties = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0)), .Names = c("teamName",
"wins", "losses", "ties"), row.names = c(NA, 13L), class = "data.frame")
teamName wins losses ties
1 Brazil 0 0 0
2 Germany 0 0 0
3 Spain 0 0 0
4 England 0 0 0
5 France 0 0 0
6 Spain 1 0 0
7 France 1 0 0
8 Germany 1 0 0
9 Brazil 1 0 0
10 England 1 0 0
11 Spain 1 0 1
12 France 1 0 1
13 Brazil 1 1 0
有一些足球国家,我想过滤这个数据框,使其只包含每个团队的最后一行。虽然这里有5个团队,但数据框中的最后5行不一定是我想要的5行。在这种情况下,巴西在最后一个德国行之前有2行。
对于此示例,每个团队的最后一行的行索引是8,10,11,12和13.
是否有一种简单的方法可以在不使用for循环的情况下获取这些索引?
谢谢!
答案 0 :(得分:5)
您可以使用duplicated
Soccer[!duplicated(Soccer$teamName, fromLast=TRUE),]
teamName wins losses ties
8 Germany 1 0 0
10 England 1 0 0
11 Spain 1 0 1
12 France 2 0 1
13 Brazil 1 1 0
答案 1 :(得分:2)
首先,添加一列以包含行号。然后使用dplyr::slice
选项并为每个组选择最后n()
。
library(dplyr)
df %>% mutate(row_num = row_number()) %>%
group_by(teamName) %>%
slice(n()) %>% arrange(row_num)
# # A tibble: 5 x 5
# # Groups: teamName [5]
# teamName wins losses ties row_num
# <chr> <dbl> <dbl> <dbl> <int>
# 1 Germany 1.00 0 0 8
# 2 England 1.00 0 0 10
# 3 Spain 1.00 0 1.00 11
# 4 France 2.00 0 1.00 12
# 5 Brazil 1.00 1.00 0 13
答案 2 :(得分:2)
library(dplyr)
df %>%
group_by(teamName) %>%
do(tail(., 1))
teamName wins losses ties
<chr> <dbl> <dbl> <dbl>
1 Brazil 1. 1. 0.
2 England 1. 0. 0.
3 France 2. 0. 1.
4 Germany 1. 0. 0.
5 Spain 1. 0. 1.
或者使用data.table:
library(data.table)
dt <- as.data.table(df)
dt[, tail(.SD, 1), teamName]
teamName wins losses ties
1: Brazil 1 1 0
2: Germany 1 0 0
3: Spain 1 0 1
4: England 1 0 0
5: France 2 0 1