在R数据帧中一次获取多个值的尾部索引

时间:2018-06-12 20:41:37

标签: r

道歉,如果这个问题有点罗嗦,但我相信这里的一个例子可以解决问题。我有以下数据框:

structure(list(teamName = c("Brazil", "Germany", "Spain", "England", 
"France", "Spain", "France", "Germany", "Brazil", "England", 
"Spain", "France", "Brazil"), wins = c(0, 0, 0, 0, 0, 1, 1, 1, 
1, 1, 1, 2, 1), losses = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
1), ties = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0)), .Names = c("teamName", 
"wins", "losses", "ties"), row.names = c(NA, 13L), class = "data.frame")

   teamName wins losses ties
1    Brazil    0      0    0
2   Germany    0      0    0
3     Spain    0      0    0
4   England    0      0    0
5    France    0      0    0
6     Spain    1      0    0
7    France    1      0    0
8   Germany    1      0    0
9    Brazil    1      0    0
10  England    1      0    0
11    Spain    1      0    1
12   France    1      0    1
13   Brazil    1      1    0

有一些足球国家,我想过滤这个数据框,使其只包含每个团队的最后一行。虽然这里有5个团队,但数据框中的最后5行不一定是我想要的5行。在这种情况下,巴西在最后一个德国行之前有2行。

对于此示例,每个团队的最后一行的行索引是8,10,11,12和13.

是否有一种简单的方法可以在不使用for循环的情况下获取这些索引?

谢谢!

3 个答案:

答案 0 :(得分:5)

您可以使用duplicated

在基础R中执行此操作
Soccer[!duplicated(Soccer$teamName, fromLast=TRUE),]
   teamName wins losses ties
8   Germany    1      0    0
10  England    1      0    0
11    Spain    1      0    1
12   France    2      0    1
13   Brazil    1      1    0

答案 1 :(得分:2)

首先,添加一列以包含行号。然后使用dplyr::slice选项并为每个组选择最后n()

library(dplyr)
df %>% mutate(row_num = row_number()) %>%
  group_by(teamName) %>%
  slice(n()) %>% arrange(row_num)

# # A tibble: 5 x 5
# # Groups: teamName [5]
#    teamName  wins losses  ties row_num
#     <chr>    <dbl>  <dbl> <dbl>   <int>
# 1 Germany   1.00   0     0          8
# 2 England   1.00   0     0         10
# 3 Spain     1.00   0     1.00      11
# 4 France    2.00   0     1.00      12
# 5 Brazil    1.00   1.00  0         13

答案 2 :(得分:2)

library(dplyr)
df %>% 
  group_by(teamName) %>% 
  do(tail(., 1))


  teamName  wins losses  ties
  <chr>    <dbl>  <dbl> <dbl>
1 Brazil      1.     1.    0.
2 England     1.     0.    0.
3 France      2.     0.    1.
4 Germany     1.     0.    0.
5 Spain       1.     0.    1.

或者使用data.table:

library(data.table)
dt <- as.data.table(df)
dt[, tail(.SD, 1), teamName]

  teamName wins losses ties
1:   Brazil    1      1    0
2:  Germany    1      0    0
3:    Spain    1      0    1
4:  England    1      0    0
5:   France    2      0    1