我有一个与访问网站相关的数据框。每天多次访问,具有不同的可能操作和操作描述
People | Date | Time | Action | Descr |
| | | | |
j | 01/01/2010 | 10:13 | X | A |
j | 01/01/2010 | 10:15 | Y | B |
j | 02/01/2010 | 14:15 | Z | C |
j | 03/01/2010 | 11:45 | X | D |
j | 03/01/2010 | 13:56 | X | E |
j | 03/01/2010 | 18:43 | Z | F |
j | 03/01/2010 | 18:44 | X | A |
将数据框缩小为平衡的每日面板数据后,我需要创建变量:
- 第一个变量(FirstX)的值必须等于当天第一个Action = X(如果可用)的描述(Descr),否则为零
- 第二个变量的值必须等于当天的第二个Action = X的描述,否则为零
-so on
一旦我将其转换为平衡的每日面板(我可以做),我需要得到一个如下所示的最终结果:
People | Date |Accesses| First X|Second X| Third X| Fourth X |
| | | | | | |
j | 01/01/2010 | 2 | A | 0 | 0 | 0 |
j | 02/01/2010 | 1 | 0 | 0 | 0 | 0 |
j | 03/01/2010 | 4 | D | E | A | 0 |
答案 0 :(得分:0)
您可以使用dplyr
包执行此操作:
library(dplyr)
df %>%
group_by(People,Date) %>%
summarise(Accesses = n(),
FirstX = ifelse(sum(Action=="X")>=1,Descr[Action=="X"][1],"0"),
SecondX = ifelse(sum(Action=="X")>=2,Descr[Action=="X"][2],"0"),
ThirdX = ifelse(sum(Action=="X")>=3,Descr[Action=="X"][3],"0"),
FourthX = ifelse(sum(Action=="X")>=4,Descr[Action=="X"][4],"0"))
返回:
People Date Accesses FirstX SecondX ThirdX FourthX
<chr> <chr> <int> <chr> <chr> <chr> <chr>
1 j 01/01/2010 2 A 0 0 0
2 j 02/01/2010 1 0 0 0 0
3 j 03/01/2010 4 D E A 0
请注意,在同一向量中不能包含数字0和字符,因此我将字符0放在FirstX,SecondX,..列中。
答案 1 :(得分:0)
我自己找到了解决方案。我在这里发布,以防这对某人有用。
# create temp variables to be used for the count(just a vector of all the
numbers from 1 to N)
subset$temp_var1<-c(1:N)
#generate a variable which starts counting from one and starts again
every time "date" or "people" change
subset$count<-ave(subset$temp_var1 , subset$date ,
subset$people , FUN = seq_along)
#drop variable "Action"
subset<-subset( subset, select=c("people" , "date" ,
"descr" , "count"))
#reshape
subset_comuni<-reshape(subset_comuni , idvar=c("nome_utente" , "date") ,
timevar = "count" , direction = "wide")