根据变量从下面的行创建新列

时间:2017-03-19 05:55:28

标签: r

我有很多运动员在比赛期间的位置数据。比赛的每个季度最多30分钟。我的数据的一个例子是:

> df
      StartValue Athlete Quarter Position
    1        0.00    Paul      Q1    Bench
    2        5.35    Paul      Q1 Defender
    3       19.26    Paul      Q1    Bench
    4       23.32    Paul      Q1 Defender
    5        0.00    Paul      Q2    Bench
    6        9.08    Paul      Q2 Defender
    7       13.11    Paul      Q2 Defender
    8        0.00    Paul      Q3 Defender
    9        7.36    Paul      Q3 Defender
    10       2.51    Paul      Q3    Bench
    11       6.44    Paul      Q4    Bench
    12      22.47    Paul      Q4    Bench
    13       0.00    Paul      Q4 Defender
    14      24.38    Paul      Q4 Defender
    15      11.36    Paul      Q4 Defender

我现在希望创建一个新列df$EndValue,它会获取下面行的StartValue并将其放在同一列中。当一个季度的最后一个条目出现时,必须在df$EndValue中放置30。例如,前几行是:

        > df
           StartValue Athlete Quarter Position EndValue
        1        0.00    Paul      Q1    Bench 5.35
        2        5.35    Paul      Q1 Defender 19.26
        3       19.26    Paul      Q1    Bench 23.32
        4       23.32    Paul      Q1 Defender 30.00
        5        0.00    Paul      Q2    Bench 9.08

我对data.frame的预期输出是:

Output <- data.frame(StartValue=c(0, 5.35, 19.26, 23.32, 
                                 0.00, 9.08, 13.11, 0, 
                                 2.51, 7.36, 0.0, 6.44,
                                 11.36, 22.47, 24.38), 
                     EndValue=c(5.35, 19.26, 23.32, 30,
                                9.08, 13.11, 30, 2.51,
                                7.36, 30, 6.44, 11.36,
                                22.47, 24.38, 30),
                    Athlete = c('Paul', 'Paul', 'Paul', 'Paul',
                                'Paul', 'Paul', 'Paul','Paul',
                                'Paul', 'Paul', 'Paul','Paul',
                                'Paul', 'Paul', 'Paul'),
                    Quarter = c('Q1', 'Q1', 'Q1', 'Q1', 
                                'Q2', 'Q2', 'Q2', 'Q3', 
                                'Q3', 'Q3', 'Q4', 'Q4', 
                                'Q4', 'Q4', 'Q4'),
                    Position = c('Bench','Defender','Bench','Defender',
                                 'Bench','Defender','Defender','Defender',
                                 'Defender','Bench','Bench','Bench', 
                                 'Defender', 'Defender', 'Defender'))

我在这30分钟的时间里有很多运动员的数据,那么我该如何快速添加这个新专栏呢?

谢谢。

2 个答案:

答案 0 :(得分:2)

setDT将数据帧转换为数据表。按Quarter分组,并将最后一个值指定为30并生成EndValue列。

library('data.table')

修改

在您的评论中,您要求使用唯一值更改每个季度的结束值。首先将StartValue分配给EndValue,然后找到每个季度中最后一个值的行索引。在下一步中,使用EndValue

更新31 for Q1, 32 for Q2, 33 for Q3 and 34 for Q4.

我创造了两个球员 - 保罗和鲍勃。除了他们的名字,他们都有相同的数据。

# sample data
setDT( df )  # convert data frame to data table by reference
df1 <- copy(df)  # replicate data by copying df
df[, Athlete := 'Bob']  # asssign Athlete with Bob player
df <- rbindlist(l = list( df1,  df) )  # combine df1 and df

# sort StartValue by player and quarter
df <- df[order(StartValue), .SD, by = .( Athlete, Quarter ) ]

# assign start to endvalue and with unique number per player per quarter
df[, EndValue := StartValue ]  # Assign StartValue to EndValue

# remove 1st, shift values up and assign NA to last
df[, EndValue := c( EndValue[-1], NA ), by = .(Athlete, Quarter )]  

df[ i = df[, .I[.N], by = .(Quarter, Athlete)][, V1], 
    j = EndValue := rep( c(31,32,33,34), 
                         length( df[, unique(Athlete) ] ) ) ]

df
#    Athlete Quarter StartValue Position EndValue
# 1:    Paul      Q1       0.00    Bench     5.35
# 2:    Paul      Q1       5.35 Defender    19.26
# 3:    Paul      Q1      19.26    Bench    23.32
# 4:    Paul      Q1      23.32 Defender    31.00
# 5:    Paul      Q2       0.00    Bench     9.08
# 6:    Paul      Q2       9.08 Defender    13.11
# 7:    Paul      Q2      13.11 Defender    32.00
# 8:    Paul      Q3       0.00 Defender     2.51
# 9:    Paul      Q3       2.51    Bench     7.36
# 10:    Paul      Q3       7.36 Defender    33.00
# 11:    Paul      Q4       0.00 Defender     6.44
# 12:    Paul      Q4       6.44    Bench    11.36
# 13:    Paul      Q4      11.36 Defender    22.47
# 14:    Paul      Q4      22.47    Bench    24.38
# 15:    Paul      Q4      24.38 Defender    34.00
# 16:     Bob      Q1       0.00    Bench     5.35
# 17:     Bob      Q1       5.35 Defender    19.26
# 18:     Bob      Q1      19.26    Bench    23.32
# 19:     Bob      Q1      23.32 Defender    31.00
# 20:     Bob      Q2       0.00    Bench     9.08
# 21:     Bob      Q2       9.08 Defender    13.11
# 22:     Bob      Q2      13.11 Defender    32.00
# 23:     Bob      Q3       0.00 Defender     2.51
# 24:     Bob      Q3       2.51    Bench     7.36
# 25:     Bob      Q3       7.36 Defender    33.00
# 26:     Bob      Q4       0.00 Defender     6.44
# 27:     Bob      Q4       6.44    Bench    11.36
# 28:     Bob      Q4      11.36 Defender    22.47
# 29:     Bob      Q4      22.47    Bench    24.38
# 30:     Bob      Q4      24.38 Defender    34.00
#     Athlete Quarter StartValue Position EndValue

答案 1 :(得分:1)

以下是使用int count=0; for (int i = 0; i < 100; i++) { Thread.Sleep(1000); count++; ProgressReportAngular(count); }

的解决方案
dplyr

如果它变得更复杂,例如多个季度长度不同的游戏,我会创建一个新的library(dplyr) quarter_lengths <- c(Q1 = 31, Q2 = 32, Q3 = 30, Q4 = 33) df %>% group_by(Athlete, Quarter) %>% mutate(EndValue = c(StartValue[-1], quarter_lengths[Quarter[1]])) ,其长度为data.frameinner_join