我的数据框中有时间序列数据,按主题ID和响应日期进行组织,每个主题在7天内响应,但在不同日期开始和结束。我需要为响应日提出一个变量,按每个主题进行索引。第一天。见例:
id response
101 11/2/2017
101 11/2/2017
101 11/3/2017
101 11/3/2017
101 11/3/2017
101 11/3/2017
102 12/14/2017
102 12/15/2017
并希望创建专栏" day"
id response day
101 11/2/2017 1
101 11/2/2017 1
101 11/3/2017 2
101 11/3/2017 2
101 11/3/2017 2
101 11/3/2017 2
102 12/14/2017 1
102 12/15/2017 2
我一直试图在dplyr中执行此操作,但无法找到正确的代码。谢谢!
答案 0 :(得分:4)
一种解决方案可以简单地使用group_by
和与组的第一条记录的差异。由于response
属于Date
类型,因此我们需要add
将同一天的回复评估为1
。这个例子。
df <- read.table(text = "id response
101 '11/2/2017'
101 '11/2/2017'
101 '11/3/2017'
101 '11/3/2017'
101 '11/3/2017'
101 '11/3/2017'
102 '12/14/2017'
102 '12/15/2017'", header = T, stringsAsFactors = F)
df$response <- as.Date(df$response, format = "%m/%d/%Y")
library(dplyr)
df %>% group_by(id) %>%
arrange(id, response) %>%
mutate(day = response - first(response)+1)
id response day
<int> <date> <time>
1 101 2017-11-02 1
2 101 2017-11-02 1
3 101 2017-11-03 2
4 101 2017-11-03 2
5 101 2017-11-03 2
6 101 2017-11-03 2
7 102 2017-12-14 1
8 102 2017-12-15 2
答案 1 :(得分:2)
我们可以使用dplyr中的dense_rank
。这里的mutate
和arrange
来电只是为了确保日期顺序正确。如果您确定日期的顺序正确,您可以忽略它。
library(dplyr)
library(lubridate)
dat2 <- dat %>%
mutate(response = mdy(response)) %>%
arrange(id, response) %>%
group_by(id) %>%
mutate(day = dense_rank(response)) %>%
ungroup()
dat2
# # A tibble: 8 x 3
# id response day
# <int> <date> <int>
# 1 101 2017-11-02 1
# 2 101 2017-11-02 1
# 3 101 2017-11-03 2
# 4 101 2017-11-03 2
# 5 101 2017-11-03 2
# 6 101 2017-11-03 2
# 7 102 2017-12-14 1
# 8 102 2017-12-15 2
数据强>
dat <- read.table(text = "id response
101 '11/2/2017'
101 '11/2/2017'
101 '11/3/2017'
101 '11/3/2017'
101 '11/3/2017'
101 '11/3/2017'
102 '12/14/2017'
102 '12/15/2017'",
header = TRUE, stringsAsFactors = FALSE)
答案 2 :(得分:1)
我们可以group_by
id然后将响应作为因子转换为数字。
dat%>%group_by(id)%>%mutate(day=as.numeric(factor(response))
# A tibble: 8 x 3
# Groups: id [2]
id response day
<int> <chr> <dbl>
1 101 11/2/2017 1
2 101 11/2/2017 1
3 101 11/3/2017 2
4 101 11/3/2017 2
5 101 11/3/2017 2
6 101 11/3/2017 2
7 102 12/14/2017 1
8 102 12/15/2017 2
如果未安排日期,则可以将级别添加到因子:
dat%>%group_by(id)%>%mutate(day=as.numeric(factor(response,unique(response))))
答案 3 :(得分:1)
如果行已经由<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<select name="class" id="year" class="form-control">
<option value="none">Select Year</option>
<option value="First_Year">First Year</option>
<option value="Second_Year">Second Year</option>
<option value="Third_Year">Third Year</option>
<option value="Fourth_Year">Fourth Year</option>
</select>
<select name="semester" id="semester" class="form-control">
<option value="none">Select Semester</option>
<option class="year1" value="Sem-1">Semester 1</option>
<option class="year1" value="Sem-2">Semester 2</option>
<option class="year2" value="Sem-3">Semester 3</option>
<option class="year2" value="Sem-4">Semester 4</option>
<option class="year3" value="Sem-5">Semester 5</option>
<option class="year3" value="Sem-6">Semester 6</option>
<option class="year4" value="Sem-7">Semester 7</option>
<option class="year4" value="Sem-8">Semester 8</option>
</select>
和id
排序,则可以使用response
&#39; data.table
函数:< / p>
rleid()
library(data.table) setDT(DF)[, day := rleid(response), by = id][]
正确的行顺序对 id response day
1: 101 11/2/2017 1
2: 101 11/2/2017 1
3: 101 11/3/2017 2
4: 101 11/3/2017 2
5: 101 11/3/2017 2
6: 101 11/3/2017 2
7: 102 12/14/2017 1
8: 102 12/15/2017 2
返回预期结果很重要。
如果行不已经排序,则rleid()
列中的数据字符串需要先强制转换为response
类。然后它可以用来订购行。
例如,使用无序数据集Date
DF2
library(data.table) library(lubridate) set.seed(123L) DF2 <- setDT(DF)[sample.int(.N)] DF2
id response
1: 101 11/3/2017
2: 101 11/3/2017
3: 102 12/15/2017
4: 101 11/3/2017
5: 101 11/3/2017
6: 101 11/2/2017
7: 101 11/2/2017
8: 102 12/14/2017
DF2[, response := mdy(response)][order(response), day := rleid(response), by = id][]
id response day
1: 101 2017-11-03 2
2: 101 2017-11-03 2
3: 102 2017-12-15 2
4: 101 2017-11-03 2
5: 101 2017-11-03 2
6: 101 2017-11-02 1
7: 101 2017-11-02 1
8: 102 2017-12-14 1
的原始行顺序尚未更改,但天数按要求编号。除非以正确的顺序打印,否则很难看到:
DF2
DF2[order(id, response)]
id response day
1: 101 2017-11-02 1
2: 101 2017-11-02 1
3: 101 2017-11-03 2
4: 101 2017-11-03 2
5: 101 2017-11-03 2
6: 101 2017-11-03 2
7: 102 2017-12-14 1
8: 102 2017-12-15 2