根据个人日期订购回复

时间:2018-02-16 18:52:14

标签: r sorting date dplyr

我的数据框中有时间序列数据,按主题ID和响应日期进行组织,每个主题在7天内响应,但在不同日期开始和结束。我需要为响应日提出一个变量,按每个主题进行索引。第一天。见例:

id  response
101 11/2/2017
101 11/2/2017
101 11/3/2017
101 11/3/2017
101 11/3/2017
101 11/3/2017
102 12/14/2017
102 12/15/2017

并希望创建专栏" day"

id  response    day
101 11/2/2017   1
101 11/2/2017   1
101 11/3/2017   2
101 11/3/2017   2
101 11/3/2017   2
101 11/3/2017   2
102 12/14/2017  1
102 12/15/2017  2

我一直试图在dplyr中执行此操作,但无法找到正确的代码。谢谢!

4 个答案:

答案 0 :(得分:4)

一种解决方案可以简单地使用group_by和与组的第一条记录的差异。由于response属于Date类型,因此我们需要add将同一天的回复评估为1。这个例子。

df <- read.table(text = "id  response
101 '11/2/2017'
101 '11/2/2017'
101 '11/3/2017'
101 '11/3/2017'
101 '11/3/2017'
101 '11/3/2017'
102 '12/14/2017'
102 '12/15/2017'", header = T, stringsAsFactors = F)

df$response <- as.Date(df$response, format = "%m/%d/%Y")

library(dplyr)

df %>% group_by(id) %>%
  arrange(id, response) %>%
  mutate(day = response - first(response)+1)

     id response   day   
  <int> <date>     <time>
1   101 2017-11-02 1     
2   101 2017-11-02 1     
3   101 2017-11-03 2     
4   101 2017-11-03 2     
5   101 2017-11-03 2     
6   101 2017-11-03 2     
7   102 2017-12-14 1     
8   102 2017-12-15 2  

答案 1 :(得分:2)

我们可以使用中的dense_rank。这里的mutatearrange来电只是为了确保日期顺序正确。如果您确定日期的顺序正确,您可以忽略它。

library(dplyr)
library(lubridate)

dat2 <- dat %>%
  mutate(response = mdy(response)) %>%
  arrange(id, response) %>%
  group_by(id) %>%
  mutate(day = dense_rank(response)) %>%
  ungroup()
dat2
# # A tibble: 8 x 3
#      id response     day
#   <int> <date>     <int>
# 1   101 2017-11-02     1
# 2   101 2017-11-02     1
# 3   101 2017-11-03     2
# 4   101 2017-11-03     2
# 5   101 2017-11-03     2
# 6   101 2017-11-03     2
# 7   102 2017-12-14     1
# 8   102 2017-12-15     2

数据

dat <- read.table(text = "id  response
101 '11/2/2017'
                  101 '11/2/2017'
                  101 '11/3/2017'
                  101 '11/3/2017'
                  101 '11/3/2017'
                  101 '11/3/2017'
                  102 '12/14/2017'
                  102 '12/15/2017'",
                  header = TRUE, stringsAsFactors = FALSE)

答案 2 :(得分:1)

我们可以group_by id然后将响应作为因子转换为数字。

dat%>%group_by(id)%>%mutate(day=as.numeric(factor(response))
# A tibble: 8 x 3
# Groups:   id [2]
     id   response   day
  <int>      <chr> <dbl>
1   101  11/2/2017     1
2   101  11/2/2017     1
3   101  11/3/2017     2
4   101  11/3/2017     2
5   101  11/3/2017     2
6   101  11/3/2017     2
7   102 12/14/2017     1
8   102 12/15/2017     2

如果未安排日期,则可以将级别添加到因子: dat%>%group_by(id)%>%mutate(day=as.numeric(factor(response,unique(response))))

答案 3 :(得分:1)

如果行已经由<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <select name="class" id="year" class="form-control"> <option value="none">Select Year</option> <option value="First_Year">First Year</option> <option value="Second_Year">Second Year</option> <option value="Third_Year">Third Year</option> <option value="Fourth_Year">Fourth Year</option> </select> <select name="semester" id="semester" class="form-control"> <option value="none">Select Semester</option> <option class="year1" value="Sem-1">Semester 1</option> <option class="year1" value="Sem-2">Semester 2</option> <option class="year2" value="Sem-3">Semester 3</option> <option class="year2" value="Sem-4">Semester 4</option> <option class="year3" value="Sem-5">Semester 5</option> <option class="year3" value="Sem-6">Semester 6</option> <option class="year4" value="Sem-7">Semester 7</option> <option class="year4" value="Sem-8">Semester 8</option> </select>id排序,则可以使用response&#39; data.table函数:< / p>

rleid()
library(data.table)
setDT(DF)[, day := rleid(response), by = id][]

正确的行顺序对 id response day 1: 101 11/2/2017 1 2: 101 11/2/2017 1 3: 101 11/3/2017 2 4: 101 11/3/2017 2 5: 101 11/3/2017 2 6: 101 11/3/2017 2 7: 102 12/14/2017 1 8: 102 12/15/2017 2 返回预期结果很重要。

如果行已经排序,则rleid()列中的数据字符串需要先强制转换为response类。然后它可以用来订购行。

例如,使用无序数据集Date

DF2
library(data.table)
library(lubridate)
set.seed(123L)
DF2 <- setDT(DF)[sample.int(.N)]
DF2
    id   response
1: 101  11/3/2017
2: 101  11/3/2017
3: 102 12/15/2017
4: 101  11/3/2017
5: 101  11/3/2017
6: 101  11/2/2017
7: 101  11/2/2017
8: 102 12/14/2017
DF2[, response := mdy(response)][order(response), day := rleid(response), by = id][]

id response day 1: 101 2017-11-03 2 2: 101 2017-11-03 2 3: 102 2017-12-15 2 4: 101 2017-11-03 2 5: 101 2017-11-03 2 6: 101 2017-11-02 1 7: 101 2017-11-02 1 8: 102 2017-12-14 1 的原始行顺序尚未更改,但天数按要求编号。除非以正确的顺序打印,否则很难看到:

DF2
DF2[order(id, response)]

数据

    id   response day
1: 101 2017-11-02   1
2: 101 2017-11-02   1
3: 101 2017-11-03   2
4: 101 2017-11-03   2
5: 101 2017-11-03   2
6: 101 2017-11-03   2
7: 102 2017-12-14   1
8: 102 2017-12-15   2