标记R中的最后一个日期

时间:2018-01-24 18:35:04

标签: r dataframe

我有游戏会话数据的数据集(id,会话数,会话的第二个会话和每个id的会话日期)。这是mydat的样本:

mydat=read.csv("C:/Users/Admin/desktop/rty.csv", sep=";",dec=",")

mydat

 structure(list(udid = c(74385162L, 79599601L, 79599601L, 91475825L, 
    91475825L, 91492531L, 92137561L, 96308016L, 96308016L, 96308016L, 
    96308016L, 96308016L, 96495076L, 97135620L, 97135620L, 97135620L, 
    97135620L, 97135620L, 97135620L, 97135620L, 97135620L, 97135620L, 
    97135620L, 97165942L), count = c(1L, 1L, 1L, 1L, 3L, 1L, 1L, 
    2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), avg_duration = c(39L, 1216L, 568L, 5L, 6L, 79L, 9L, 426L, 
    78L, 884L, 785L, 785L, 22L, 302L, 738L, 280L, 2782L, 5L, 2284L, 
    144L, 234L, 231L, 539L, 450L), date = structure(c(13L, 3L, 3L, 
    1L, 1L, 14L, 2L, 11L, 11L, 11L, 12L, 12L, 9L, 7L, 4L, 4L, 5L, 
    6L, 8L, 8L, 8L, 8L, 8L, 10L), .Label = c("11.10.16", "12.12.16", 
    "15.11.16", "15.12.16", "16.12.16", "17.12.16", "18.10.16", "18.12.16", 
    "21.10.16", "26.10.16", "28.11.16", "29.11.16", "31.10.16", "8.10.16"
    ), class = "factor")), .Names = c("udid", "count", "avg_duration", 
    "date"), class = "data.frame", row.names = c(NA, -24L))

我需要在每个最后一个日期之后看到玩家被看到1,并且在看到这个id时将零换成另一个日期。 例如id 96308016有5个ob。因此,我们用1标记最后的第五个观察,并且前面的4个观察标记为零。 如果id有1个观察值,我们用1标记它id 74385162

使其更清晰,这是我的预期输出

udid    count   avg_duration    date    diff
74385162    1   39              31.10.16    1
79599601    1   1216            15.11.16    0
79599601    1   568             15.11.16    1
91475825    1   5               11.10.16    0
91475825    3   6               11.10.16    1
91492531    1   79              8.10.16     1
92137561    1   9               12.12.16    1
96308016    2   426             28.11.16    0
96308016    2   78              28.11.16    0
96308016    1   884             28.11.16    0
96308016    1   785             29.11.16    0
96308016    1   785             29.11.16    1
96495076    1   22              21.10.16    1
97135620    2   302             18.10.16    0
97135620    1   738             15.12.16    0
97135620    1   280             15.12.16    0
97135620    1   2782            16.12.16    0
97135620    1   5               17.12.16    0
97135620    1   2284            18.12.16    0
97135620    1   144             18.12.16    0
97135620    1   234             18.12.16    0
97135620    1   231             18.12.16    0
97135620    1   539              18.12.16   1
97165942    1   450             26.10.16    1

怎么做?

3 个答案:

答案 0 :(得分:3)

您可以执行以下操作:

library(dplyr)
mydat = mydat  %>%
  group_by(udid) %>% 
  mutate(diff=ifelse(row_number()==n(),1,0)) %>% 
  as.data.frame()

输出:

       udid count avg_duration     date diff
1  74385162     1           39 31.10.16    1
2  79599601     1         1216 15.11.16    0
3  79599601     1          568 15.11.16    1
4  91475825     1            5 11.10.16    0
5  91475825     3            6 11.10.16    1
6  91492531     1           79  8.10.16    1
7  92137561     1            9 12.12.16    1
8  96308016     2          426 28.11.16    0
9  96308016     2           78 28.11.16    0
10 96308016     1          884 28.11.16    0
11 96308016     1          785 29.11.16    0
12 96308016     1          785 29.11.16    1
13 96495076     1           22 21.10.16    1
14 97135620     2          302 18.10.16    0
15 97135620     1          738 15.12.16    0
16 97135620     1          280 15.12.16    0
17 97135620     1         2782 16.12.16    0
18 97135620     1            5 17.12.16    0
19 97135620     1         2284 18.12.16    0
20 97135620     1          144 18.12.16    0
21 97135620     1          234 18.12.16    0
22 97135620     1          231 18.12.16    0
23 97135620     1          539 18.12.16    1
24 97165942     1          450 26.10.16    1

答案 1 :(得分:2)

如果它已经按日期排序,那么这将起作用:

mydat$diff = as.integer(!duplicated(mydat$udid, fromLast = TRUE))

head(mydat)
#        udid count avg_duration     date diff
# 1  74385162     1           39 31.10.16    1
# 2  79599601     1         1216 15.11.16    0
# 3  79599601     1          568 15.11.16    1
# 4  91475825     1            5 11.10.16    0
# 5  91475825     3            6 11.10.16    1
# 6  91492531     1           79  8.10.16    1

如果它尚未按日期排序,请转换为Date类,排序,然后执行以上操作:

mydat$date = as.Date(mydat$date, format = "%d.%M.%y")
mydat = mydat[order(mydat$udid, mydat$date), ]

答案 2 :(得分:1)

如果您不想按日期排序,那么逻辑答案应该通过以下方式实现:

mydat$date = as.Date(mydat$date, "%d.%M.%y")

mydat %>% 
  group_by(udid) %>%
  mutate(diff = ifelse(date == max(date), 1L, 0L)) #Last date

      udid count avg_duration date        diff
      <int> <int>        <int> <date>     <int>
 1 74385162     1           39 2016-01-31     1
 2 79599601     1         1216 2016-01-15     1
 3 79599601     1          568 2016-01-15     1
 4 91475825     1            5 2016-01-11     1
 5 91475825     3            6 2016-01-11     1
 6 91492531     1           79 2016-01-08     1
 7 92137561     1            9 2016-01-12     1
 8 96308016     2          426 2016-01-28     0
 9 96308016     2           78 2016-01-28     0
10 96308016     1          884 2016-01-28     0
# ... with 14 more rows

但似乎,您的样本日期得到duplicate date,这不允许上述逻辑工作。但该解决方案应该适用于实际数据,尤其是当date位于date/time时。