Question

我有多年以来数千名美国篮球运动员的数据。

每个篮球运动员都有一个唯一的ID。众所周知，在给定年份中，哪个团队和哪个位置打球，就像下面的模拟数据df：

df <- data.frame(id = c(rep(1:4, times=2), 1), 
             year = c(1, 1, 2, 2, 3, 4, 4, 4,5),
             team = c(1,2,3,4, 2,2,4,4,2),
             position = c(1,2,3,4,1,1,4,4,4))
> df
  id year team position
1  1    1    1        1
2  2    1    2        2
3  3    2    3        3
4  4    2    4        4
5  1    3    2        1
6  2    4    2        1
7  3    4    4        4
8  4    4    4        4
9  1    5    2        4

将df转换为下面的new_df的有效方法是什么？

> new_df
  id move time position.1 position.2 year.1 year.2
1  1    0    2          1          1      1      3
2  2    1    3          2          1      1      4
3  3    0    2          3          4      2      4
4  4    1    2          4          4      2      4
5  1    0    2          1          4      3      5

在new_df中，将第一次出现的篮球运动员与第二次出现的篮球进行比较，记录该球员是否更换了球队，以及该球员花费了多长时间进行了切换。

注意：

在真实数据中，一些篮球运动员出现两次以上，可以为多个球队和多个位置打球。
在这种情况下，new_df中会添加一个新行，该行将玩家的每个额外出现次数与之前的出现次数进行比较。

编辑：由于前两个句子中提到的原因，我认为这不是一个相当简单的reshape练习。为了澄清这一点，我在模拟数据中添加了另外一个玩家ID 1。

我们将竭诚欢迎您的帮助！

Answer 1

s=table(df$id)
df$time=rep(1:max(s),each=length(s))
df1 = reshape(df,idvar = "id",dir="wide")
transform(df1, move=+(team.1==team.2),time=year.2-year.1)

 id year.1 team.1 position.1 year.2 team.2 position.2 move time
1  1      1      1          1      3      2          1    0    2
2  2      1      2          2      4      2          1    1    3
3  3      2      3          3      4      4          4    0    2
4  4      2      4          4      4      4          4    1    2

Answer 2

下面的代码应该可以帮助您直到数据转置为止您必须创建move和time变量

df <- data.frame(id = rep(1:4, times=2), 
                 year = c(1, 1, 2, 2, 3, 4, 4, 4),
                 team = c(1, 2, 3, 4, 2, 2, 4, 4),
                 position = c(1, 2, 3, 4, 1, 1, 4, 4))

library(reshape2)
library(data.table)

setDT(df) #convert to data.table
df[,rno:=rank(year,ties="min"),by=.(id)] #gives the occurance

#creating the transposed dataset
Dcast_DT<-dcast(df,id~rno,value.var = c("year","team","position"))

Answer 3

这段代码使用#transform to data.table dt <- as.data.table(df) #sort on year setorder(dt, year, na.last=TRUE) #indicate the names of the new columns new_cols= c("time", "move", "prev_team", "prev_year", "prev_position") #set up the new variables dtt[ , (new_cols) := list(year - shift(year),team!= shift(team), shift(team), shift(year), shift(position)), by = id] # select only repeating occurrences dtt <- dtt[!is.na(dtt$time),] #outcome dtt id year team position time move prev_team prev_year prev_position 1: 1 3 2 1 2 TRUE 1 1 1 2: 2 4 2 1 3 FALSE 2 1 2 3: 3 4 4 4 2 TRUE 3 2 3 4: 4 4 4 4 2 FALSE 4 2 4 5: 1 5 2 4 2 FALSE 2 3 1

达到了目的。

wait()

成对操作data.frame中的行

3 个答案: