如何为因子的每个实例分配特定值?

时间:2020-02-10 22:53:18

标签: r

说我有一个看起来像这样的数据框:

 playerID    yearID salary
1 abbotje01   1998 175000
2 abbotje01   1999 255000
3 abbotje01   2000 255000
4 abbotje01   2001 300000
5 abbotku01   1993 109000
6 abbotku01   1994 109000
.
.
.

如何获取一个数据框架,为每个唯一的玩家ID分配最近一年的薪水,像这样:

 playerID    yearID salary
1 abbotje01   1998 300000
2 abbotje01   1999 300000
3 abbotje01   2000 300000
4 abbotje01   2001 300000
5 abbotku01   1993 109000
6 abbotku01   1994 109000

我想保留玩家ID的每个实例,而只是用相同的薪水重新分配每个实例

2 个答案:

答案 0 :(得分:1)

按“ playerID”分组后,获取“ yearID”的max值索引以提取与之对应的“工资”并用mutate更新“工资”列

library(dplyr)
df1 %>%
     group_by(playerID) %>%
      mutate(salary = salary[which.max(yearID)])
# A tibble: 6 x 3
# Groups:   playerID [2]
#  playerID  yearID salary
#  <chr>      <int>  <int>
#1 abbotje01   1998 300000
#2 abbotje01   1999 300000
#3 abbotje01   2000 300000
#4 abbotje01   2001 300000
#5 abbotku01   1993 109000
#6 abbotku01   1994 109000

或使用data.table

library(data.table)
setDT(df1)[, salary := salary[which.max(yearID)], playerID]

数据

df1 <- structure(list(playerID = c("abbotje01", "abbotje01", "abbotje01", 
"abbotje01", "abbotku01", "abbotku01"), yearID = c(1998L, 1999L, 
2000L, 2001L, 1993L, 1994L), salary = c(175000L, 255000L, 255000L, 
300000L, 109000L, 109000L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

答案 1 :(得分:0)

我们可以基于order yearID数据帧,然后从每个组中提取最后一个salary

这可以在基数R中完成

df <- df[with(df, order(playerID, yearID)), ]
df$final_salary <- with(df, ave(salary, playerID, FUN = function(x) x[length(x)]))
#Also
#df$final_salary <- with(df, ave(salary, playerID, FUN = function(x) tail(x, 1)))

df

#   playerID yearID salary final_salary
#1 abbotje01   1998 175000       300000
#2 abbotje01   1999 255000       300000
#3 abbotje01   2000 255000       300000
#4 abbotje01   2001 300000       300000
#5 abbotku01   1993 109000       109000
#6 abbotku01   1994 109000       109000

dplyr

library(dplyr)
df %>%
  arrange(playerID, yearID) %>%
  group_by(playerID) %>%
  mutate(final_salary = last(salary))

data.table

library(data.table)

setDT(df)
df[order(yearID), final_salary := last(salary), playerID]